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POSITION DEPENDENT RECOGNITION OF 
GNN NUCLEOTIDE TRIPLETS BY ZINC FINGERS 

CROSS-REFERENCES TO RELATED APPLICATIONS 
The present application is a continuation-in-part of copending U.S. Patent 
Application Serial No. 09/535,008, filed March 23, 2000, which application claims the 
benefit of U.S. provisional applications 60/126,238, filed March 24, 1999, 60/126,239 
filed March 24, 1999, 60/146,595 filed July 30, 1999 and 60/146,615 filed July 30, 1999. 
The present application is also a continuation-in-part of copending U.S. Patent 
Application Serial No. 09/716,637, filed November 20, 2000. The disclosures of all of 
the aforementioned applications are hereby incorporated by reference in their entireties 
for all purposes. 



BACKGROUND 

Zinc finger proteins (ZFPs) are proteins that can bind to DNA in a sequence- 
specific manner. Zinc fingers were first identified in the transcription factor TFIIIA from 
the oocytes of the African clawed toad, Xenopus laevis. An exemplary motif 
characterizing one class of these protein (C 2 H 2 class) is -Cys-(X) 2 ^-Cys-(X)i2-His-(X) 3 .5- 
His (where X is any amino acid) (SEQ. ID. No:l). A single finger domain is about 30 
amino acids in length, and several structural studies have demonstrated that it contains an 
alpha helix containing the two invariant histidine residues and two invariant cysteine 
residues in a beta turn co-ordinated through zinc. To date, over 10,000 zinc finger 
sequences have been identified in several thousand known or putative transcription 
factors. Zinc finger domains are involved not only in DNA-recognition, but also in RNA 
binding and in protein-protein binding. Current estimates are that this class of molecules 
will constitute about 2% of all human genes. 

The x-ray crystal structure of Zif268, a three-finger domain from a murine 
transcription factor, has been solved in complex with a cognate DNA sequence and 
shows that each finger can be superimposed on the next by a periodic rotation. The 
structure suggests that each finger interacts independently with DNA over 3 base-pair 
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intervals, with side-chains at positions -1, 2 , 3 and 6 on each recognition hehx making 
contacts with their respective DNA triplet subsites. The amino terminus of Zi£268 is 
situated at the 3 ' end of the DNA strand with which it makes most contacts. Some zinc 
fingers can bind to a fourth base in a target segment. If the strand with which a zinc 
5 finger protein makes most contacts is designated the target strand, some zinc finger 

proteins bind to a three base triplet in the target strand and a fourth base on the nontarget 
strand. The fourth base is complementary to the base immediately 3 ' of the three base 
subsite. 

The structure of the Zif268-DNA complex also suggested that the DNA sequence 
1 0 specificity of a zinc finger protein might be altered by making amino acid substitutions at 

the four helix positions (-1, 2, 3 and 6) on each of the zinc finger recognition helices. 

Phage display experiments using zinc finger combinatorial libraries to test this 
S observation were published in a series of papers in 1994 (Rebar et al., Science 263, 671- 

£ 673 (1994); Jamieson et al., Biochemistry 33, 5689-5695 (1994); Choo et al, PNAS 91 , 

'% 15 11163-11167(1994)). Combinatorial libraries were constructed with randomized side- 
4C chains in either the first or middle finger of Zif268 and then used to select for an altered 

§«£, Zif268 binding site in which the appropriate DNA sub-site was replaced by an altered 

!T! DNA triplet. Further, correlation between the nature of introduced mutations and the 

£ £ % 

O resulting alteration in binding specificity gave rise to a partial set of substitution rules for 

rr 20 design of ZFPs with altered binding specificity. 

Greisman & Pabo, Science 275, 657-661 (1997) discuss an elaboration of the 
phage display method in which each finger of a Zif268 was successively randomized and 
selected for binding to a new triplet sequence. This paper reported selection of ZFPs for a 
nuclear hormone response element, a p53 target site and a TATA box sequence. 
25 A number of papers have reported attempts to produce ZFPs to modulate 

particular target sites. For example, Choo et aL, Nature 372, 645 (1994), report an 
attempt to design a ZFP that would repress expression of a bcr-abl oncogene. The target 
segment to which the ZFPs would bind was a nine base sequence 5'GCA GAA GCC3' 
chosen to overlap the junction created by a specific oncogenic translocation fusing the 
30 genes encoding bcr and abl. The intention was that a ZFP specific to this target site 

would bind to the oncogene without binding to abl or bcr component genes. The authors 
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used phage display to screen a mini-library of variant ZFPs for binding to this target 
segment. A variant ZFP thus isolated was then reported to repress expression of a stably 
transfected bcr-able construct in a cell line. 

Pomerantz et al., Science 267, 93-96 (1995) reported an attempt to design a novel 
DNA binding protein by fusing two fingers from Zif268 with a homeodomain from Oct- 
1 . The hybrid protein was then fused with a transcriptional activator for expression as a 
chimeric protein. The chimeric protein was reported to bind a target site representing a 
hybrid of the subsites of its two components. The authors then constructed a reporter 
vector containing a luciferase gene operably linked to a promoter and a hybrid site for the 
chimeric DNA binding protein in proximity to the promoter. The authors reported that 
their chimeric DNA binding protein could activate expression of the luciferase gene. 

Liu et al, PNAS 94, 5525-5530 (1997) report forming a composite zinc finger 
protein by using a peptide spacer to link two component zinc finger proteins each having 
three fingers. The composite protein was then further linked to transcriptional activation 
domain. It was reported that the resulting chimeric protein bound to a target site formed 
from the target segments bound by the two component zinc finger proteins. It was further 
reported that the chimeric zinc finger protein could activate transcription of a reporter 
gene when its target site was inserted into a reporter plasmid in proximity to a promoter 
operably linked to the reporter. 

Choo et al, WO 98/53058, WO98/53059, and WO 98/53060 (1998) discuss 
selection of zinc finger proteins to bind to a target site within the HIV Tat gene. Choo et 
al. also discuss selection of a zinc finger protein to bind to a target site encompassing a 
site of a common mutation in the oncogene ras. The target site within ras was thus 
constrained by the position of the mutation. 

Previously-disclosed methods for the design of sequence-specific zinc finger 
proteins have often been based on modularity of individual zinc fingers; i.e., the ability 
of a zinc finger to recognize the same target subsite regardless of the location of the 
finger in a multi-finger protein. Although, in many instances, a zinc finger retains the 
same sequence specificity regardless of its location within a multi-finger protein; in 
certain cases, the sequence specificity of a zinc finger depends on its position. For 
example, it is possible for a finger to recognize a particular triplet sequence when it is 
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present as finger 1 of a three-finger protein, but to recognize a different triplet sequence 
when present as finger 2 of a three-finger protein. 

Attempts to address situations in which a zinc finger behaves in a non-modular 
fashion (i.e., its sequence specificity depends upon its location in a multi-finger protein) 
have, to date, involved strategies employing randomization of key binding residues in 
multiple adjacent zinc fingers, followed by selection. See, for example, Isalan et al. 
(2001) Nature Biotechnol. 19:656-660. However, methods for rational design of 
polypeptides containing non-modular zinc fingers have not heretofore been described. 

SUMMARY 

The present disclosure provides compositions comprising and methods involving 
position dependent recognition of GNN nucleotide triplets by zinc fingers. 

Thus, provided herein is a zinc finger protein that binds to a target site, said zinc 
finger protein comprising a first (Fl), a second (F2), and a third (F3) zinc finger, ordered 
Fl, F2, F3 from N-terminus to C-terminus, said target site comprising, in 3' to 5' 
direction, a first (SI), a second (S2), and a third (S3) target subsite, each target subsite 
having the nucleotide sequence GNN, wherein if SI comprises GAA, Fl comprises the 
amino acid sequence QRSNLVR; if S2 comprises GAA, F2 comprises the amino acid 
sequence QSGNLAR; if S3 comprises GAA, F3 comprises the amino acid sequence 
QSGNLAR; if SI comprises GAG, Fl comprises the amino acid sequence RSDNLAR; if 

52 comprises GAG, F2 comprises the amino acid sequence RSDNLAR; if S3 comprises 
GAG, F3 comprises the amino acid sequence RSDNLTR; if SI comprises GAC, Fl 
comprises the amino acid sequence DRSNLTR; if S2 comprises GAC, F2 comprises the 
amino acid sequence DRSNLTR; if S3 comprises GAC, F3 comprises the amino acid 
sequence DRSNLTR; if SI comprises GAT, Fl comprises the amino acid sequence 
QSSNLAR; if S2 comprises GAT, F2 comprises the amino acid sequence TSGNLVR; if 

53 comprises GAT, F3 comprises the amino acid sequence TSANLSR; if SI comprises 
GGA, Fl comprises the amino acid sequence QSGHLAR; if S2 comprises GGA, F2 
comprises the amino acid sequence QSGHLQR; if S3 comprises GGA, F3 comprises the 
amino acid sequence QSGHLQR; if SI comprises GGG, Fl comprises the amino acid 
sequence RSDHLAR; if S2 comprises GGG, F2 comprises the amino acid sequence 
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RSDHLSR; if S3 comprises GGG, F3 comprises the amino acid sequence RSDHLSR; if 
SI comprises GGC, Fl comprises the amino acid sequence DRSHLRT; if S2 comprises 
GGC, F2 comprises the amino acid sequence DRSHLAR; if SI comprises GGT, Fl 
comprises the amino acid sequence QSSHLTR; if S2 comprises GGT, F2 comprises the 
5 amino acid sequence TSGHLSR; if S3 comprises GGT, F3 comprises the amino acid 
sequence TSGHLVR; if SI comprises GCA, Fl comprises the amino acid sequence 
QSGSLTR; if S2 comprises GCA, F2 comprises QSGDLTR; if S3 comprises GCA, F3 
comprises QSGDLTR; if SI comprises GCG, Fl comprises the amino acid sequence 
RSDDLTR; if S2 comprises GCG, F2 comprises the amino acid sequence RSDDLQR; if 
1 0 S3 comprises GCG, F3 comprises the amino acid sequence RSDDLTR; if S 1 comprises 
GCC, Fl comprises the amino acid sequence ERGTLAR; if S2 comprises GCC, F2 
jj comprises the amino acid sequence DRSDLTR; if S3 comprises GCC, F3 comprises the 

m amino acid sequence DRSDLTR; if S 1 comprises GCT, F 1 comprises the amino acid 

sequence QSSDLTR; if S2 comprises GCT, F2 comprises the amino acid sequence 
J; 15 QSSDLTR; if S3 comprises GCT, F3 comprises the amino acid sequence QSSDLQR; if 
JE SI comprises GTA, Fl comprises the amino acid sequence QSGALTR; if S2 comprises 

Jj, GTA, F2 comprises the amino acid sequence QSGALAR; if SI comprises GTG, Fl 

jJJ comprises the amino acid sequence RSDALTR; if S2 comprises GTG, F2 comprises the 

Q amino acid sequence RSDALSR; if S3 comprises GTG, F3 comprises the amino acid 
P 20 sequence RSDALTR; if S 1 comprises GTC, Fl comprises the amino acid sequence 

DRSALAR; if S2 comprises GTC, F2 comprises the amino acid sequence DRSALAR; 
and if S3 comprises GTC, F3 comprises the amino acid sequence DRSALAR. 

Also provided are methods of designing a zinc finger protein comprising a first 
(Fl), a second (F2), and a third (F3) zinc finger, ordered Fl, F2, F3 from N-terrninus to 
25 C-terminus that binds to a target site comprising, in 3' to 5' direction, a first (SI), a 
second (S2), and a third (S3) target subsite, each target subsite having the nucleotide 
sequence GNN, the method comprising the steps of (a) selecting the Fl zinc finger such 
that it binds to the SI target subsite, wherein if SI comprises GAA, Fl comprises the 
amino acid sequence QRSNLVR; if SI comprises GAG, Fl comprises the amino acid 
30 sequence RSDNLAR; if S 1 comprises GAC, Fl comprises the amino acid sequence 

DRSNLTR; if SI comprises GAT, Fl comprises the amino acid sequence QSSNLAR; if 
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SI comprises GGA, Fl comprises the amino acid sequence QSGHLAR; if SI comprises 
GGG, Fl comprises the amino acid sequence RSDHLAR; if SI comprises GGC, Fl 
comprises the amino acid sequence DRSHLRT; if SI comprises GGT, Fl comprises the 
amino acid sequence QSSHLTR; if SI comprises GCA, Fl comprises QSGSLTR; if SI 
5 comprises GCG, Fl comprises RSDDLTR; if S2 comprises GCG, F2 comprises 

RSDDLQR; if SI comprises GCC, Fl comprises ERGTLAR; if SI comprises GCT, Fl 
comprises the amino acid sequence QSSDLTR; if SI comprises GTA, Fl comprises the 
amino acid sequence QSGALTR; if SI comprises GTG, Fl comprises the amino acid 
sequence RSDALTR; if SI comprises GTC, Fl comprises the amino acid sequence 
1 0 DRS ALAR; (b) selecting the F2 zinc finger such that it binds to the S2 target subsite, 
_ wherein S2 comprises GAA, F2 comprises the amino acid sequence QSGNLAR; if S2 

y3 comprises GAG, F2 comprises the amino acid sequence RSDNLAR; if S2 comprises 

gj GAC, F2 comprises the amino acid sequence DRSNLTR; if S2 comprises GAT, F2 

j comprises the amino acid sequence TSGNLVR; if S2 comprises GGA, F2 comprises the 

$ 1 5 amino acid sequence QSGHLQR; if S2 comprises GGG, F2 comprises the amino acid 
T sequence RSDHLSR; if S2 comprises GGC, F2 comprises the amino acid sequence 

j-* DRSHLAR; if S2 comprises GGT, F2 comprises the amino acid sequence TSGHLSR; if 

m S2 comprises GCA, F2 comprises the amino acid sequence QSGDLTR; if S2 comprises 

^ GCC, F2 comprises the amino acid sequence DRSDLTR; if S2 comprises GCT, F2 

H= 20 comprises the amino acid sequence QSSDLTR; if S2 comprises GTA, F2 comprises the 
amino acid sequence QSGALAR; if S2 comprises GTG, F2 comprises the amino acid 
sequence RSDALSR; if S2 comprises GTC, F2 comprises the amino acid sequence 
DRSALAR; and (c) selecting the F3 zinc finger such that it binds to the S3 target subsite, 
wherein if S3 comprises GAA, F3 comprises the amino acid sequence QSGNLAR; if S3 
25 comprises GAG, F3 comprises the amino acid sequence RSDNLTR; if S3 comprises 
GAC, F3 comprises the amino acid sequence DRSNLTR; if S3 comprises GAT, F3 
comprises the amino acid sequence TSANLSR; if S3 comprises GGA, F3 comprises the 
amino acid sequence QSGHLQR; if S3 comprises GGG, F3 comprises RSDHLSR; if S3 
comprises GGT, F3 comprises the arnino acid sequence TSGHLVR; if S3 comprises 
30 GCA, F3 comprises the amino acid sequence QSGDLTR; if S3 comprises GCG, F3 

comprises the amino acid sequence RSDDLTR; if S3 comprises GCC, F3 comprises the 
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amino acid sequence DRSDLTR; if S3 comprises GCT, F3 comprises the amino acid 
sequence QSSDLQR; if S3 comprises GTG, F3 comprises RSDALTR; and if S3 
comprises GTC, F3 comprises the amino acid sequence DRSALAR; 

thereby designing a zinc finger protein that binds to a target site. 

In certain embodiments of the zinc finger proteins and methods described herein, 
SI comprises GAA and Fl comprises the amino acid sequence QRSNLVR. In other 
embodiments, S2 comprises GAA and F2 comprises the amino acid sequence 
QSGNLAR. In other embodiments, S3 comprises GAA and F3 comprises the amino acid 
sequence QSGNLAR. In other embodiments, SI comprises GAG and Fl comprises the 
amino acid sequence RSDNLAR. In other embodiments, S2 comprises GAG and F2 
comprises the amino acid sequence RSDNLAR. In other embodiments, S3 comprises 
GAG and F3 comprises the amino acid sequence RSDNLTR. In other embodiments, SI 
comprises GAC and Fl comprises the amino acid sequence DRSNLTR. In other 
embodiments, S2 comprises GAC and F2 comprises the amino acid sequence 
DRSNLTR. In other embodiments, S3 comprises GAC and F3 comprises the amino acid 
sequence DRSNLTR. hi other embodiments, SI comprises GAT and Fl comprises the 
amino acid sequence QSSNLAR. In other embodiments, S2 comprises GAT and F2 
comprises the amino acid sequence TSGNLVR. In other embodiments, S3 comprises 
GAT and F3 comprises the amino acid sequence TSANLSR. In other embodiments, SI 
comprises GGA and Fl comprises the amino acid sequence QSGHLAR. In other 
embodiments, S2 comprises GGA and F2 comprises the amino acid sequence 
QSGHLQR. In other embodiments, S3 comprises GGA and F3 comprises the amino acid 
sequence QSGHLQR. In other embodiments, SI comprises GGG and Fl comprises the 
amino acid sequence RSDHLAR. In other embodiments, S2 comprises GGG and F2 
comprises the amino acid sequence RSDHLSR. In other embodiments, S3 comprises 
GGG and F3 comprises the amino acid sequence RSDHLSR. In other embodiments, SI 
comprises GGC and Fl comprises the amino acid sequence DRSHLTR. In other 
embodiments, S2 comprises GGC and F2 comprises the amino acid sequence 
DRSHLAR. In other embodiments, SI comprises GGT and Fl comprises the amino acid 
sequence QSSHLTR. In other embodiments, S2 comprises GGT and F2 comprises the 
amino acid sequence TSGHLSR. In other embodiments, S3 comprises GGT and F3 
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comprises the amino acid sequence TSGHLVR. In other embodiments, SI comprises 
GCA and Fl comprises the amino acid sequence QSGSLTR. In other embodiments, S2 
comprises GCA and F2 comprises the amino acid sequence QSGDLTR. In other 
embodiments, S3 comprises GCA and F3 comprises the amino acid sequence 
QSGDLTR. In other embodiments, SI comprises GCG and Fl comprises the amino acid 
sequence RSDDLTR. In other embodiments, S2 comprises GCG and F2 comprises the 
amino acid sequence RSDDLQR. In other embodiments, S3 comprises GCG and F3 
comprises the amino acid sequence RSDDLTR. In other embodiments, SI comprises 
GCC and Fl comprises the amino acid sequence ERGTLAR. In other embodiments, S2 
comprises GCC and F2 comprises the amino acid sequence DRSDLTR. In other 
embodiments, S3 comprises GCC and F3 comprises the amino acid sequence DRSDLTR. 
In other embodiments, SI comprises GCT and Fl comprises the amino acid sequence 
QSSDLTR. In other embodiments, S2 comprises GCT and F2 comprises the amino acid 
sequence QSSDLTR. In other embodiments, S3 comprises GCT and F3 comprises the 
amino acid sequence QSSDLQR. In other embodiments, SI comprises GTA and Fl 
comprises the amino acid sequence QSGALTR. In other embodiments, S2 comprises 
GTA and F2 comprises the amino acid sequence QSGALAR. In other embodiments, SI 
comprises GTG and Fl comprises the amino acid sequence RSDALTR. In other 
embodiments, S2 comprises GTG and F2 comprises the amino acid sequence RSDALSR. 
In other embodiments, S3 comprises GTG and F3 comprises the amino acid sequence 
RSDALTR. In other embodiments, SI comprises GTC and Fl comprises the amino acid 
sequence DRSALAR. In other embodiments, S2 comprises GTC and F2 comprises the 
amino acid sequence DRSALAR. In other embodiments, S3 comprises GTC and F3 
comprises the amino acid sequence DRSALAR. 

Also provided are polypeptides comprising any of zinc finger proteins described 
herein. In certain embodiments, the polypeptide further comprises at least one functional 
domain. Also provided are polynucleotides encoding any of the polypeptides described 
herein. Thus, also provided are nucleic acid encoding zinc fingers, including all of the 

zinc fingers described above. 
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Also provided are segments of a zinc finger comprising a sequence of seven 
contiguous amino acids as shown herein. Also provided are nucleic acids encoding any 
of these segments and zinc fingers comprising the same. 

Also provided are zinc finger proteins comprising first, second and third zinc 
fingers. The first, second and third zinc fingers comprise respectively first, second and 
third segments of seven contiguous amino acids as shown herein. Also provided are 
nucleic acids encoding such zinc finger proteins. 

BRIEF DESCRIPTION OF THE DRAWINGS 
Figure 1 shows results of site selection analysis of two representative zinc finger 
proteins (leftmost 4 columns) and measurements of binding affinity for each of these 
proteins to their intended target sequences and to variant target sequences, (rightmost 3 
columns). Analysis of ZFP1 is shown in the upper portion of the figure and analysis of 
ZFP2 is shown in the lower portion of the figure. For the site selection analyses, the 
amino acid sequences of residues -1 through +6 of the recognition helix of each of the 
three component zinc fingers (F3, F2 and Fl) are shown across the top row; the intended 
target sequence (divided into finger-specific target subsites) is shown across the second 
row, and a summary of the sequences bound is shown in the third row. Data for F3 is 
shown in the second column, data for F2 is shown in the third column, and data for Fl is 
shown in the third column. 

For the binding affinity analyses, the designed target sequence for each ZFP 
("cognate") and two related sequences ("Mt") are shown (column 6), along with the IQ 
for binding of the ZFP to each of these sequences (column 7). 

Figure 2 shows amino acid sequences of zinc finger recognition regions (amino 
acids -1 through +6 of the recognition helix) that bind to each of the 16 GNN triplet 
subsites. Three amino acid sequences are shown for each trinucleotide subsite; these 
correspond to optimal amino acid sequences for recognition of the subsite from each of 
the three positions (finger 1, Fl ; finger 2, F2; or finger 3, F3) in a three-finger zinc finger 
protein. Amino acid sequences are from N-terminal to C-terminal; nucleotide sequences 
are from 5' to 3'. 
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Also shown are site selection results for each of the 48 position-dependent GNN- 
recognizing zinc fingers. These show the number of times a particular nucleotide was 
present, at a given position, in a collection of oligonucleotide sequences bound by the 
finger. For example, out of 15 oligonucleotides bound by a zinc finger protein with the 
amino acid sequence QSGHLAR present at the finger 1 (Fl) position, 15 contained a G 
in the 5' -most position of the subsite, 15 contained a G in the middle position of the 
subsite, while, at the 3 '-most position of the subsite, 10 contained an A, 3 contained a G 
and 2 contained a T. Accordingly, this particular amino acid sequence is optimal for 
binding a GGA triplet from the Fl position. 

Figures 3 A, 3B and 3C show site selection data indicating positional dependence 
of GCA-, GAT- and GGT-binding zinc fingers. The first and fourth (where applicable) 
rows of each figure show portions of the amino acid sequence of a designed zinc finger 
protein. Amino acid residues-1 through +6 of each a-helix are listed from left to right. 
The second and fifth (where applicable) rows show the target sequence, divided into three 
triplet subsites, one for each finger of the protein shown in the first and fourth (where 
applicable) rows, respectively. The third and sixth (where applicable) rows show the 
distribution of nucleotides in the oligonucleotides obtained by site selection with the 
proteins shown in the first and fourth (where applicable) rows, respectively. Figure 3 A 
shows data for fingers designed to bind GCA; Figure 3B shows data for fingers designed 
to bind GAT; Figure 3C shows data for fingers designed to bind GGT. 

Figures 4A and 4B show properties of the engineered ZFP EP2C. Figure 4A 
shows site selection data. The first row provides the amino acid sequences of residues -1 
through +6 of the recognition helices for each of the three zinc fingers of the EP2C 
protein. The second row shows the target sequence (5 5 to 3'); with the distribution of 
nucleotides in the oligonucleotides obtained by site selection indicated below the target 
sequence. 

Figure 4B shows in vitro and in vivo assays for the binding specificity of EP2C. 
The first three columns show in vitro measurements of binding affinity of EP2C to its 
intended target sequence and several related sequences. The first column gives the name 
of each sequence (2C0 is the intended target sequence, compare to Figure 4A). The 
second column shows the nucleotide sequence of various target sequences, with 
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differences from the intended target sequence (2C0) highlighted. The third column 
shows the K4 (in nM) for binding of EP2C to each of the target sequences. IQs were 
determined by gel shift assays, using 2-fold dilution series of EP2C. The right side of the 
figure (fourth column and bar graph) shows relative luciferase activities (normalized to 
p-galactosidase levels) in stable cell lines in which expression of EP2C is inducible. 
Cells were co-transfected with a vector containing a luciferase coding region under the 
transcriptional control of the target sequence shown in the same row of the figure, and a 
control vector encoding fS-galactosidase. Luciferase and p-galactosidase levels were 
measured after induction of EP2C expression. Triplicate samples were assayed and the 
standard deviations are shown in the bar graph. pGL3 is a luciferase-encoding vector 
lacking EP2C target sequences. 3B is another negative control, in which luciferase 
expression is under transcriptional control of sequences (3B) unrelated to the EP2C target 
sequence. 

DEFINITIONS 

A zinc finger DNA binding protein is a protein or segment within a larger protein 
that binds DNA in a sequence-specific manner as a result of stabilization of protein 
structure through coordination of a zinc ion. The term zinc finger DNA binding protein 
is often abbreviated as zinc finger protein or ZFP. 

Zinc finger proteins can be engineered to recognize a selected target sequence in a 
nucleic acid. Any method known in the art or disclosed herein can be used to construct 
an engineered zinc finger protein or a nucleic acid encoding an engineered zinc finger 
protein. These include, but are not limited to, rational design, selection methods (e.g., 
phage display) random mutagenesis, combinatorial libraries, computer design, affinity 
selection, use of databases matching zinc finger amino acid sequences with target subsite 
nucleotide sequences, cloning from cDNA and/or genomic libraries, and synthetic 
constructions. An engineered zinc finger protein can comprise a new combination of 
naturally-occurring zinc finger sequences. Methods for engineering zinc finger proteins 
are disclosed in co-owned WO 00/41566 and WO 00/42219; as well as in WO 98/53057; 
WO 98/53058; WO 98/53059 and WO 98/53060; the disclosures of which are hereby 
incorporated by reference in their entireties. Methods for identifying preferred target 
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sequences, and for engineering zinc finger proteins to bind to such preferred target 
sequences, are disclosed in co-owned WO 00/42219. 

A designed zinc finger protein is a protein not occurring in nature whose 
design/composition results principally from rational criteria. Rational criteria for design 
include application of substitution rules and computerized algorithms for processing 
information in a database storing information of existing ZFP designs and binding data. 

A selected zinc finger protein is a protein not found in nature whose production 
results primarily from an empirical process such as phage display. 

The term naturally-occurring is used to describe an object that can be found in 
nature as distinct from being artificially produced by man. For example, a polypeptide or 
polynucleotide sequence that is present in an organism (including viruses) that can be 
isolated from a source in nature and which has not been intentionally modified by man in 
the laboratory is naturally-occurring. Generally, the term naturally-occurring refers to an 
object as present in a non-pathological (undiseased) individual, such as would be typical 
for the species. 

A nucleic acid is operably linked when it is placed into a functional relationship 
with another nucleic acid sequence. For instance, a promoter or enhancer is operably 
linked to a coding sequence if it increases the transcription of the coding sequence. 
Operably linked means that the DNA sequences being linked are typically contiguous 
and, where necessary to join two protein coding regions, contiguous and in reading 
frame. However, since enhancers generally function when separated from the promoter 
by up to several kilobases or more and intronic sequences may be of variable lengths, 
some polynucleotide elements may be operably linked but not contiguous. 

A specific binding affinity between, for example, a ZFP and a specific target site 
means a binding affinity of at least 1x10 M" . 

The terms "modulating expression" "inhibiting expression" and "activating 
expression" of a gene refer to the ability of a zinc finger protein to activate or inhibit 
transcription of a gene. Activation includes prevention of subsequent transcriptional 
inhibition (i.e., prevention of repression of gene expression) and inhibition includes 
prevention of subsequent transcriptional activation (i.e., prevention of gene activation). 
Modulation can be assayed by determining any parameter that is indirectly or directly 
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affected by the expression of the target gene. Such parameters include, e.g., changes in 
RNA or protein levels, changes in protein activity, changes in product levels, changes in 
downstream gene expression, changes in reporter gene transcription (luciferase, CAT, 
beta-galactosidase, GFP (see, e.g., Mistili & Spector, Nature Biotechnology 15:961-964 
(1997)); changes in signal transduction, phosphorylation and dephosphorylation, 
receptor-ligand interactions, second messenger concentrations (e.g., cGMP, cAMP, IP3, 
and Ca2+), cell growth, neovascularization, in vitro, in vivo, and ex vivo. Such functional 
effects can be measured by any means known to those skilled in the art, e.g., 
measurement of RNA or protein levels, measurement of RNA stability, identification of 
downstream or reporter gene expression, e.g., via chemiluminescence, fluorescence, 
colorimetric reactions, antibody binding, inducible markers, ligand binding assays; 
changes in intracellular second messengers such as cGMP and inositol triphosphate (IP3); 
changes in intracellular calcium levels; cytokine release, and the like. 

A "regulatory domain" refers to a protein or a protein subsequence that has 
transcriptional modulation activity. Typically, a regulatory domain is covalently or non- 
covalently linked to a ZFP to modulate transcription. Alternatively, a ZFP can act alone, 
without a regulatory domain, or with multiple regulatory domains to modulate 
transcription. 

A D-able subsite within a target site has the motif 5'NNGK3 \ A target site 
containing one or more such motifs is sometimes described as a D-able target site. A 
zinc finger appropriately designed to bind to a D-able subsite is sometimes referred to as 
a D-able finger. Likewise a zinc finger protein containing at least one finger designed or 
selected to bind to a target site including at least one D-able subsite is sometimes referred 
to as a D-able zinc finger protein. 

DETAILED DESCRIPTION 

I. General 

Tables 1-5 list a collection of nonnaturally occurring zinc finger protein 
sequences and their corresponding target sites. The first column of each table is an 
internal reference number. The second column lists a 9 or 10 base target site bound by a 
three-finger zinc finger protein, with the target sites listed in 5 ' to 3 ' orientation. The 
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third column provides SEQ ID NOs for the target site sequences listed in column 2. The 
fourth, sixth and eighth columns list amino acid residues from the first, second and third 
fingers, respectively, of a zinc finger protein which recognizes the target sequence listed 
in the second column. For each finger, seven amino acids, occupying positions -1 to +6 
of the finger, are listed. The numbering convention for zinc fingers is defined below. 
Columns 5, 7 and 9 provide SEQ ID NOs for the amino acid sequences listed in columns 
4, 6 and 8, respectively. The final column of each table lists the binding affinity (z.e., the 
Ka in nM) of the zinc finger protein for its target site. Binding affinities are measured as 
described below. 

Each finger binds to a triplet of bases within a corresponding target sequence. 
The first finger binds to the first triplet starting from the V end of a target site, the second 
finger binds to the second triplet, and the third finger binds the third {i.e., the 5'-most) 
triplet of the target sequence. For example, the RSDSLTS finger (SEQ ID NO: 646) of 
SBS# 201 (Table 2) binds to 5TTG3', the ERSTLTR finger (SEQ ID NO: 851) binds 
to5'GCC3' and the QRADLRR finger (SEQ ID NO: 1056) binds to 5'GCA3\ 

Table 6 lists a collection of consensus sequences for zinc fingers and the target 
sites bound by such sequences. Conventional one letter amino acid codes are used to 
designate amino acids occupying consensus positions. The symbol "X" designates a 
nonconsensus position that can in principle be occupied by any amino acid. In most zinc 
fingers of the C2H2 type, binding specificity is principally conferred by residues -1, +2, 
-1-3 and +6. Accordingly, consensus sequence determining binding specificity typically 
include at least these residues. Consensus sequences are useful for designing zinc fingers 
to bind to a given target sequence. Residues occupying other positions can be selected 
based on sequences in Tables 1-5, or other known zinc finger sequences. Alternatively, 
these positions can be randomized with a plurality of candidate amino acids and screened 
against one or more target sequences to refine binding specificity or improve binding 
specificity. In general, the same consensus sequence can be used for design of a zinc 
finger regardless of the relative position of that finger in a multi-finger zinc finger 
protein. For example, the sequence RXDNXXR can be used to design a N-terminal, 
central or C-terminal finger of three finger protein. However, some consensus sequences 
are most suitable for designing a zinc finger to occupy a particular position in a multi- 
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finger protein. For example, the consensus sequence RXDHXXQ is most suitable for 
designing a C-terminal finger of a three-finger protein. 

II. Characteristics of Zinc Finger Proteins 

Zinc finger proteins are formed from zinc finger components. For example, zinc 
finger proteins can have one to thirty-seven fingers, commonly having 2, 3, 4, 5 or 6 
fingers. A zinc finger protein recognizes and binds to a target site (sometimes referred to 
as a target segment) that represents a relatively small subsequence within a target gene. 
Each component finger of a zinc finger protein can bind to a subsite within the target site. 
The subsite includes a triplet of three contiguous bases all on the same strand (sometimes 
referred to as the target strand). The subsite may or may not also include a fourth base on 
the opposite strand that is the complement of the base immediately 3 ? of the three 
contiguous bases on the target strand. In many zinc finger proteins, a zinc finger binds to 
its triplet subsite substantially independently of other fingers in the same zinc finger 
protein. Accordingly, the binding specificity of zinc finger protein containing multiple 
fingers is usually approximately the aggregate of the specificities of its component 
fingers. For example, if a zinc finger protein is formed from first, second and third 
fingers that individually bind to triplets XXX, YYY, and ZZZ, the binding specificity of 
the zinc finger protein is 3 'XXX YYY ZZZ5\ 

The relative order of fingers in a zinc finger protein from N-terminal to C- 
terminal determines the relative order of triplets in the 3' to 5' direction in the target. 
For example, if a zinc finger protein comprises from N-terminal to C-terminal first, 
second and third fingers that individualy bind, respectively, to triplets 5 5 GAC3\ 
5'GTA3' and 5"GGC3' then the zinc finger protein binds to the target segment 
3 'CAGATGCGG5 ' . If the zinc finger protein comprises the fingers in another order, for 
example, second finger, first finger, third finger, then the zinc finger protein binds to a 
target segment comprising a different permutation of triplets, in this example, 
3 ' ATGC AGCGG5 ' (see Berg & Shi, Science 271, 1081-1086 (1996)). The assessment 
of binding properties of a zinc finger protein as the aggregate of its component fingers 
may, in some cases, be influenced by context-dependent interactions of multiple fingers 
binding in the same protein. 
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Two or more zinc finger proteins can be linked to have a target specificity that is 
the aggregate of that of the component zinc finger proteins (see e.g., Kim & Pabo, PNAS 
95, 2812-2817 (1998)). For example, a first zinc finger protein having first, second and 
third component fingers that respectively bind to XXX, YYY and ZZZ can be linked to a 
second zinc finger protein having first, second and third component fingers with binding 
specificities, AAA, BBB and CCC. The binding specificity of the combined first and 
second proteins is thus 3'XXXYYYZZZ AAABBBCCC5', where the underline 
indicates a short intervening region (typically 0-5 bases of any type). In this situation, the 
target site can be viewed as comprising two target segments separated by an intervening 
segment. 

Linkage can be accomplished using any of the following peptide linkers. 
T GEKP: (SEQ. ID. No:2) (Liu et al., 1997, supra.); (G4S)n (SEQ. ID. No:3) (Kim et 
al, PNAS 93, 1156-1160 (1996.); GGRRGGGS; (SEQ. ID. No:4) LRQRDGERP; (SEQ. 
ID. No:5) LRQKDGGGSERP; (SEQ. ID. No:6) LRQKD(G3S)2 ERP (SEQ. ID. No:7) 
Alternatively, flexible linkers can be rationally designed using computer programs 
capable of modeling both DNA-binding sites and the peptides themselves or by phage 
display methods . In a further variation, noncovalent linkage can be achieved by fusing 
two zinc finger proteins with domains promoting heterodimer formation of the two zinc 
finger proteins. For example, one zinc finger protein can be fused with fos and the other 
with jun (see Barbas et al., WO 95/119431). 

Linkage of two zinc finger proteins is advantageous for conferring a unique 
binding specificity within a mammalian genome. A typical mammalian diploid genome 
consists of 3 x 10 9 bp. Assuming that the four nucleotides A, C, G, and T are randomly 
distributed, a given 9 bp sequence is present -23,000 times. Thus a ZFP recognizing a 9 
bp target with absolute specificity would have the potential to bind to -23,000 sites 
within the genome. An 1 8 bp sequence is present once in 3.4 x 10 10 bp, or about once in 
a random DNA sequence whose complexity is ten times that of a mammalian genome. 

A component finger of zinc finger protein typically contains about 30 amino acids 
and has the following motif (N-C) : 

(SEQ. ID. No:8) 

Cys- (X) 2 - 4 -Cys-X.X.X.X.X.X.X.X.X.X.X.X-His- (X) 3 _ 5 -His 



16 



8325-00011.20 
S11-US2 

-11234567 

The two invariant histidine residues and two invariant cysteine residues in a single 
beta turn are co-ordinated through zinc (see, e.g., Berg & Shi, Science 271, 1081-1085 
(1996)). The above motif shows a numbering convention that is standard in the field for 
the region of a zinc finger conferring binding specificity. The amino acid on the left (N- 
terminal side) of the first invariant His residues is assigned the number +6, and other 
amino acids further to the left are assigned successively decreasing numbers. The alpha 
helix begins at residue 1 and extends to the residue following the second conserved 
histidine. The entire helix is therefore of variable length, between 1 1 and 13 residues. 

The process of designing or selecting a nonnaturally occurring or variant ZFP 
typically starts with a natural ZFP as a source of framework residues. The process of 
design or selection serves to define nonconserved positions (i.e., positions -1 to +6) so as 
to confer a desired binding specificity. One suitable ZFP is the DNA binding domain of 
the mouse transcription factor Zif268. The DNA binding domain of this protein has the 
amino acid sequence: 

YACPVESCDRRFSRSDELTRHIRIHTGQKP (Fl) (SEQ. ID No:9) 
FQCRICMRNFSRSDHLTTHIRTHTGEKP (F2) (SEQ. ID. No: 10) 
FACDICGRKFARSDERKRHTKIHLRQK (F3) SEQ. ID. No:ll) 
and binds to a target 5' GCG TGG GCG V (SEQ ID No:12). 

Another suitable natural zinc finger protein as a source of framework residues is 
Sp-1. The Sp-1 sequence used for construction of zinc finger proteins corresponds to 
amino acids 531 to 624 in the Sp-1 transcription factor. This sequence is 94 amino acids 
in length. The amino acid sequence of Sp-1 is as follows: 
PGKKKQHICHIQGCGKVYGKTSHLRAHLRWHTGERP 
FMCTWSYCGKRFTRSDELQRHKRTHTGEKK 
FACPECPKRFMRSDHLSKHIKTHQNKKG (SEQ. ID. No:13) 
Sp-1 binds to a target site 5'GGG GCG GGG3' (SEQ ID No: 14). 

An alternate form of Sp-1, an Sp-1 consensus sequence, has the following amino 
acid sequence: 
meklrngsgd 

PGKKKQHACPECGKSFSKSSHLRAHQRTHTGERP 
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YKCPECGKSFSRSDELQRHQRTHTGEKP 

YKCPECGKSFSRSDHLSKHQRTHQNKKG (SEQ. ID. No:15) (lower case letters are a 
leader sequence from Shi & Berg, Chemistry and Biology 1, 83-89. (1995). The optimal 
binding sequence for the Sp-1 consensus sequence is 5'GGGGCGGGG3' (SEQ ID No: 
16) . Other suitable ZFPs are described below. 

There are a number of substitution rules that assist rational design of some zinc 
finger proteins (see Desjarlais & Berg, PNAS 90, 2256-2260 (1993); Choo & Klug, PNAS 
91, 11163-11167 (1994); Desjarlais & Berg, PNAS 89, 7345-7349 (1992); Jamieson et 
al., supra; Choo et al., WO 98/53057, WO 98/53058; WO 98/53059; WO 98/53060). 
Many of these rules are supported by site-directed mutagenesis of the three-finger domain 
of the ubiquitous transcription factor, Sp-1 (Desjarlais and Berg, 1992; 1993). One of 
these rules is that a 5' G in a DNA triplet can be bound by a zinc finger incorporating 
arginine at position 6 of the recognition helix. Another substitution rule is that a G in the 
middle of a subsite can be recognized by including a histidine residue at position 3 of a 
zinc finger. A further substitution rule is that asparagine can be incorporated to recognize 
A in the middle of triplet, aspartic acid, glutamic acid, serine or threonine can be 
incorporated to recognize C in the middle of triplet, and amino acids with small side 
chains such as alanine can be incorporated to recognize T in the middle of triplet. A 
further substitution rule is that the 3' base of triplet subsite can be recognized by 
incorporating the following amino acids at position -1 of the recognition helix: arginine 
to recognize G, glutamine to recognize A, glutamic acid (or aspartic acid) to recognize C, 
and threonine to recognize T. Although these substitution rules are useful in designing 
zinc finger proteins they do not take into account all possible target sites. Furthermore, 
the assumption underlying the rules, namely that a particular amino acid in a zinc finger 
is responsible for binding to a particular base in a subsite is only approximate. Context- 
dependent interactions between proximate amino acids in a finger or binding of multiple 
amino acids to a single base or vice versa can cause variation of the binding specificities 
predicted by the existing substitution rules. 

The technique of phage display provides a largely empirical means of generating 
zinc finger proteins with a desired target specificity (see e.g., Rebar, US 5,789,538; Choo 
et al., WO 96/061 66; Barbas et al., WO 95/1943 1 and WO 98/543 111; Jamieson et al, 



18 



8325-00011.20 
S11-US2 

supra). The method can be used in conjunction with, or as an alternative to rational 
design. The method involves the generation of diverse libraries of mutagenized zinc 
finger proteins, followed by the isolation of proteins with desired DNA-binding 
properties using affinity selection methods. To use this method, the experimenter 
typically proceeds as follows. First, a gene for a zinc finger protein is mutagenized to 
introduce diversity into regions important for binding specificity and/or affinity. In a 
typical application, this is accomplished via randomization of a single finger at positions 
-1, +2, +3, and +6, and sometimes accessory positions such as +1, +5, +8 and +10. Next, 
the mutagenized gene is cloned into a phage or phagemid vector as a fusion with gene III 
of a filamentous phage, which encodes the coat protein pill. The zinc finger gene is 
inserted between segments of gene III encoding the membrane export signal peptide and 
the remainder of pill, so that the zinc finger protein is expressed as an amino-terminal 
fusion with pill or in the mature, processed protein. When using phagemid vectors, the 
mutagenized zinc finger gene may also be fused to a truncated version of gene III 
encoding, minimally, the C-terminal region required for assembly of pill into the phage 
particle. The resultant vector library is transformed into E. coli and used to produce 
filamentous phage which express variant zinc finger proteins on their surface as fusions 
with the coat protein pill. If a phagemid vector is used, then the this step requires 
superinfection with helper phage. The phage library is then incubated with target DNA 
site, and affinity selection methods are used to isolate phage which bind target with high 
affinity from bulk phage. Typically, the DNA target is immobilized on a solid support, 
which is then washed under conditions sufficient to remove all but the tightest binding 
phage. After washing, any phage remaining on the support are recovered via elution 
under conditions which disrupt zinc finger - DNA binding. Recovered phage are used to 
infect fresh E. coli., which is then amplified and used to produce a new batch of phage 
particles. Selection and amplification are then repeated as many times as is necessary to 
enrich the phage pool for tight binders such that these may be identified using sequencing 
and/or screening methods. Although the method is illustrated for pEEI fusions, analogous 
principles can be used to screen ZFP variants as pVIII fusions. 

In certain embodiments, the sequence bound by a particular zinc finger protein is 
deteraiined by conducting binding reactions (see, e.g., conditions for determination of K<j, 



19 



8325-00011.20 
S11-US2 

infra) between the protein and a pool of randomized double-stranded oligonucleotide 
sequences. The binding reaction is analyzed by an electrophoretic mobility shift assay 
(EMSA), in which protein-DNA complexes undergo retarded migration in a gel and can 
be separated from unbound nucleic acid. Oligonucleotides which have bound the finger 
are purified from the gel and amplified, for example, by a polymerase chain reaction. 
The selection {i.e. binding reaction and EMSA analysis) is then repeated as many times 
as desired, with the selected oligonucleotide sequences. In this way, the binding 
specificity of a zinc finger protein having a particular amino acid sequence is determined. 

Zinc finger proteins are often expressed with a heterologous domain as fusion 
proteins. Common domains for addition to the ZFP include, e.g., transcription factor 
domains (activators, repressors, co-activators, co-repressors), silencers, oncogenes (e.g., 
myc, jun, fos, myb, max, mad, rel, ets, bcl, myb, mos family members etc.); DNA repair 
enzymes and their associated factors and modifiers; DNA rearrangement enzymes and 
their associated factors and modifiers; chromatin associated proteins and their modifiers 
(e.g. kinases, acetylases and deacetylases); and DNA modifying enzymes (e.g., 
methyltransferases, topoisomerases, helicases, ligases, kinases, phosphatases, 
polymerases, endonucleases) and their associated factors and modifiers. A preferred 
domain for fusing with a ZFP when the ZFP is to be used for represssing expression of a 
target gene is a KRAB repression domain from the human KOX-1 protein (Thiesen et al, 
New Biologist 2, 363-374 (1990); Margolin et al., Proc. Natl. Acad. Sci. USA 91, 4509- 
4513 (1994); Pengue et al., NucL Acids Res. 22:2908-2914 (1994); Witzgall et al, Proc. 
Natl Acad. Sci USA 91, 4514-4518 (1994). Preferred domains for achieving activation 
include the HSV VP16 activation domain (see, e.g., Hagmann et al., Virol 71, 5952- 
5962 (1997)) nuclear hormone receptors (see, e.g., Torchia et al., Curr. Opin. Cell BioL 
10:373-383 (1998)); the p65 subunit of nuclear factor kappa B (Bitko & Barik, 1 Virol 
72:5610-5618 (1998)and Doyle & Hunt, Neuroreport 8:2937-2942 (1997)); Liu et al., 
Cancer Gene Ther. 5:3-28 (1998)), or artificial chimeric functional domains such as 
VP64 (Seifpal et al, EMBOJ. 11, 4961-4968 (1992)). 

An important factor in the administration of polypeptide compounds, such as the 
ZFPs, is ensuring that the polypeptide has the ability to traverse the plasma membrane of 
a cell, or the membrane of an intra-cellular compartment such as the nucleus. Cellular 
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membranes are composed of lipid-protein bilayers that are freely permeable to small, 
nonionic lipophilic compounds and are inherently impermeable to polar compounds, 
macromolecules, and therapeutic or diagnostic agents. However, proteins and other 
compounds such as liposomes have been described, which have the ability to translocate 
5 polypeptides such as ZFPs across a cell membrane. 

For example, "membrane translocation polypeptides" have amphiphilic or 
hydrophobic amino acid subsequences that have the ability to act as membrane- 
translocating carriers. In one embodiment, homeodomain proteins have the ability to 
translocate across cell membranes. The shortest internalizable peptide of a homeodomain 
10 protein, Antennapedia, was found to be the third helix of the protein, from amino acid 
^ position 43 to 58 {see, e.g., Prochiantz, Current Opinion in Neurobiology 6:629-634 

iO (1996)). Another subsequence, the h (hydrophobic) domain of signal peptides, was found 

m to have similar cell membrane translocation characteristics (see, e.g., Lin et aL, J. Biol. 

« Chem. 270:1 4255-14258 (1995)). 

yn 15 Examples of peptide sequences which can be linked to a ZFP, for facilitating 

^ uptake of ZFP into cells, include, but are not limited to: an 1 1 amino acid peptide of the 

H tat protein of HIV; a 20 residue peptide sequence which corresponds to amino acids 84- 

£ >- 

ill 103 of the pl6 protein (see Fahraeus et aL, Current Biology 6:84 (1996)); the third helix 

5f of the 60-amino acid long homeodomain of Antennapedia (Derossi et aL, J. Biol. Chem. 

ih 4 20 269:10444 (1994)); the h region of a signal peptide such as the Kaposi fibroblast growth 
factor (K-FGF) h region (Lin et aL, supra); or the VP22 translocation domain from HSV 
(Elliot & O'Hare, Cell 88:223-233 (1997)). Other suitable chemical moieties that 
provide enhanced cellular uptake may also be chemically linked to ZFPs. 

Toxin molecules also have the ability to transport polypeptides across cell 
25 membranes. Often, such molecules are composed of at least two parts (called "binary 
toxins"): a translocation or binding domain or polypeptide and a separate toxin domain 
or polypeptide. Typically, the translocation domain or polypeptide binds to a cellular 
receptor, and then the toxin is transported into the cell. Several bacterial toxins, 
including Clostridium perfringens iota toxin, diphtheria toxin (DT), Pseudomonas 
30 exotoxin A (PE), pertussis toxin (PT), Bacillus anthracis toxin, and pertussis adenylate 
cyclase (CYA), have been used in attempts to deliver peptides to the cell cytosol as 
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internal or amino-terminal fusions (Arora et al, J. Biol Chem., 268:3334-3341 (1993); 
Perelle et al, Infect. Immun., 61:5147-5156 (1993); Stenmark et al, J, Cell Biol 
113:1025-1032 (1991); Donnelly et al, PNAS 90:3530-3534 (1993); Carbonetti etaL, 
Abstr. Annu. Meet Am, Soc. Microbiol 95:295 (1995); Sebo et al y Infect. Immun. 
63:3851-3857 (1995); Klimpel et al, PNAS U.S.A. 89:10277-10281 (1992); and Novak et 
al, J. Biol Chem. 267:17186-17193 1992)). 

Such subsequences can be used to translocate ZFPs across a cell membrane. 
ZFPs can be conveniently fused to or derivatized with such sequences. Typically, the 
translocation sequence is provided as part of a fusion protein. Optionally, a linker can be 
used to link the ZFP and the translocation sequence. Any suitable linker can be used, 
e.g., a peptide linker. 

III. Position Dependence Of Subsite Recognition By Zinc Fingers 

A number of the polypeptides disclosed herein have been characterized using the 
methods disclosed in parent application Serial No. 09/716,637 (the disclosure of which is 
hereby incorporated by reference in its entirety); in particular with respect to the effect of 
their position, within a multi-finger protein, on their sequence specificity. The results of 
these investigations provide a set of zinc finger sequences that are optimized for 
recognition of certain triplet target subsites whose 5 '-most nucleotide is a G (i.e., GNN 
triplet subsites). Thus, particular zinc finger sequences which recognize each of the GNN 
triplet subsites, from each position of a three-finger zinc finger protein, are provided. See 
Figure 2. It will be clear to those of skill in the art that the optimized, position-specific 
zinc finger sequences disclosed herein for recognition of GNN target subsites are not 
limited to use in three-finger proteins. For example, they are also useful in six-finger 
proteins, which can be made by linkage of two three-finger proteins. 

A number of zinc finger amino acid sequences which are reported to bind to target 
subsites in which the 5 5 -most nucleotide residue is, G (i.e. , GNN subsites) have recently 
been disclosed. Segal et al (1999) Proc. Natl Acad. Sci. USA 96:2758-2763; Drier et 
al (2000) J. Mol Biol 303:489-502; U.S. Patent No. 6,140,081. These GNN-binding 
zinc fingers were obtained by selection of finger 2 sequences from phage display libraries 
of three- finger proteins, in which certain amino acid residues of finger 2 had been 
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randomized. Due to the manner in which they were selected, it is not clear whether these 
sequences would have the same target subsite specificity if they were present in the Fl 
and/or F3 positions. 

Use of the methods and compositions disclosed herein has now allowed 
identification of specific zinc finger sequences that bind each of the 16 GNN triplet 
subsites, and for the first time, provides zinc finger sequences that are optimized for 
recognition of these triplet subsites in a position-dependent fashion. Moreover, in vivo 
studies of these optimized designs reveal that the functionality of a ZFP is correlated with 
its binding affinity to its target sequence. See Example 6, infra. 

As a result of the discovery, disclosed herein, that sequence recognition by zinc 
fingers is position-dependent, it is clear that existing design rules will not, in and of 
themselves, be applicable to every situation in which it is necessary to construct a 
sequence-specific ZFP. The results disclosed herein show that many zinc fingers that are 
constructed based on design rules exhibit the sequence specificity predicted by those 
design rules only at certain finger positions. The position-specific zinc fingers disclosed 
herein are likely to function more efficiently in vivo and in cultured cells, with fewer 
nonspecific effects. Highly specific ZFPs, made using position-specific zinc fingers, will 
be useful tools in studying gene function and will find broad applications in areas as 
diverse as human therapeutics and plant engineering. 

IV. Production of Zinc Finger Proteins 

ZFP polypeptides and nucleic acids encoding the same can be made using routine 
techniques in the field of recombinant genetics. Basic texts disclosing the general 
methods include Sambrook et al., Molecular Cloning, A Laboratory Manual (2nd ed. 
1989); Kriegler, Gene Transfer and Expression: A Laboratory Manual (1990); and 
Current Protocols in Molecular Biology (Ausubel et al., eds., 1994)). In addition, 
nucleic acids less than about 100 bases can be custom ordered from any of a variety of 
commercial sources, such as The Midland Certified Reagent Company 
(mcrc@oligos.com), The Great American Gene Company (http://www.genco.com), 
ExpressGen Inc. (www.expressgen.com), Operon Technologies Inc. (Alameda, CA). 
Similarly, peptides can be custom ordered from any of a variety of sources, such as 
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PeptidoGenic (pldm@ccnet.com), HTI Bio-products, inc. (http://www.htibio.com), BMA 
Biomedicals Ltd (U.K.), Bio. Synthesis, Inc. 

Oligonucleotides can be chemically synthesized according to the solid phase 
phosphoramidite triester method first described by Beaucage & Caruthers, Tetrahedron 
Letts. 22:1859-1862 (1981), using an automated synthesizer, as described in Van 
Devanter et al. ? Nucleic Acids Res. 12:6159-6168 (1984). Purification of 
oligonucleotides is by either denaturing polyacrylamide gel electrophoresis or by reverse 
phase HPLC. The sequence of the cloned genes and synthetic oligonucleotides can be 
verified after cloning using, e.g., the chain termination method for sequencing double- 
stranded templates of Wallace et al, Gene 16:21-26 (1981). 

Two alternative methods are typically used to create the coding sequences 
required to express newly designed DNA-binding peptides. One protocol is a PCR-based 
assembly procedure that utilizes six overlapping oligonucleotides (Fig. 1). Three 
oligonucleotides (oligos 1, 3, and 5 in Figure 1) correspond to "universal" sequences that 
encode portions of the DNA-binding domain between the recognition helices. These 
oligonucleotides typically remain constant for all zinc finger constructs. The other three 
"specific" oligonucleotides (oligos 2, 4, and 6 in Fig. 1) are designed to encode the 
recognition helices. These oligonucleotides contain substitutions primarily at positions - 
1, 2, 3 and 6 on the recognition helices making them specific for each of the different 
DNA-binding domains. 

The PCR synthesis is carried out in two steps. First, a double stranded DNA 
template is created by combining the six oligonucleotides (three universal, three specific) 
in a four cycle PCR reaction with a low temperature annealing step, thereby annealing the 
oligonucleotides to form a DNA "scaffold." The gaps in the scaffold are filled in by 
high-fidelity thermostable polymerase, the combination of Taq and Pfu polymerases also 
suffices. In the second phase of construction, the zinc finger template is amplified by 
external primers designed to incorporate restriction sites at either end for cloning into a 
shuttle vector or directly into an expression vector. 

An alternative method of cloning the newly designed DNA-binding proteins relies 
on annealing complementary oligonucleotides encoding the specific regions of the 
desired ZFP. This particular application requires that the oligonucleotides be 
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phosphorylated prior to the final ligation step. This is usually performed before setting 
up the annealing reactions. In brief, the "universal" oligonucleotides encoding the 
constant regions of the proteins (oligos 1, 2 and 3 of above) are annealed with their 
complementary oligonucleotides. Additionally, the "specific" oligonucleotides encoding 
the finger recognition helices are annealed with their respective complementary 
oligonucleotides. These complementary oligos are designed to fill in the region which 
was previously filled in by polymerase in the above-mentioned protocol. The 
complementary oligos to the common oligos 1 and finger 3 are engineered to leave 
overhanging sequences specific for the restriction sites used in cloning into the vector of 
choice in the following step. The second assembly protocol differs from the initial 
protocol in the following aspects: the "scaffold" encoding the newly designed ZFP is 
composed entirely of synthetic DNA thereby eliminating the polymerase fill-in step, 
additionally the fragment to be cloned into the vector does not require amplification. 
Lastly, the design of leaving sequence-specific overhangs eliminates the need for 
restriction enzyme digests of the inserting fragment. Alternatively, changes to ZFP 
recognition helices can be created using conventional site-directed mutagenesis methods. 

Both assembly methods require that the resulting fragment encoding the newly 
designed ZFP be ligated into a vector. Ultimately, the ZFP-encoding sequence is cloned 
into an expression vector. Expression vectors that are commonly utilized include, but are 
not limited to, a modified pMAL-c2 bacterial expression vector (New England BioLabs 
or an eukaryotic expression vector, pcDNA (Promega). The final constructs are verified 
by sequence analysis. 

Any suitable method of protein purification known to those of skill in the art can 
be used to purify ZFPs (see, Ausubel, supra, Sambrook, supra). In addition, any suitable 
host can be used for expression, e.g., bacterial cells, insect cells, yeast cells, mammalian 
cells, and the like. 

Expression of a zinc finger protein fused to a maltose binding protein (MBP-ZFP) 
in bacterial strain JM109 allows for straightforward purification through an amylose 
column (NEB). High expression levels of the zinc finger chimeric protein can be 
obtained by induction with IPTG since the MBP-ZFP fusion in the pMal-c2 expression 
plasmid is under the control of the tac promoter (NEB). Bacteria containing the MBP- 
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ZFP fusion plasmids are inoculated into 2xYT medium containing lO^M ZnC12, 0.02% 
glucose, plus 50 fig/ml ampicillin and shaken at 37°C. At mid-exponential growth IPTG 
is added to 0.3 mM and the cultures are allowed to shake. After 3 hours the bacteria are 
harvested by centrifugation, disrupted by sonication or by passage through a french 
pressure cell or through the use of lysozyme, and insoluble material is removed by 
centrifugation. The MBP-ZFP proteins are captured on an amylose-bound resin, washed 
extensively with buffer containing 20 mM Tris-HCl (pH 7.5), 200 mM NaCl, 5 mM DTT 
and 50 \xM ZnC12 , then eluted with maltose in essentially the same buffer (purification is 
based on a standard protocol from NEB). Purified proteins are quantitated and stored for 
biochemical analysis. 

The dissociation constants of the purified proteins, e.g., Kd, are typically 
characterized via electrophoretic mobility shift assays (EMSA) (Buratowski & Chodosh, 
in Current Protocols in Molecular Biology pp. 12.2.1-12.2.7 (Ausubel ed., 1996)). 
Affinity is measured by titrating purified protein against a fixed amount of labeled 
double-stranded oligonucleotide target. The target typically comprises the natural 
binding site sequence flanked by the 3 bp found in the natural sequence and additional, 
constant flanking sequences. The natural binding site is typically 9 bp for a three-finger 
protein and 2 x 9 bp + intervening bases for a six finger ZFP. The annealed 
oligonucleotide targets possess a 1 base 5' overhang which allows for efficient labeling 
of the target with T4 phage polynucleotide kinase. For the assay the target is added at a 
concentration of 1 nM or lower (the actual concentration is kept at least 10-fold lower 
than the expected dissociation constant), purified ZFPs are added at various 
concentrations, and the reaction is allowed to equilibrate for at least 45 min. In addition 
the reaction mixture also contains 10 mM Tris (pH 7.5), 100 mM KC1, 1 mM MgC12, 0.1 
mM ZnC12, 5 mM DTT, 10% glycerol, 0.02% BSA. (NB: in earlier assays poly d(IC) 
was also added at 10-100 \igf\d.) 

The equilibrated reactions are loaded onto a 10% polyacrylamide gel, which has 
been pre-run for 45 min in Tris/glycine buffer, then bound and unbound labeled target is 
resolved by electrophoresis at 150V. (alternatively, 10-20% gradient Tris-HCl gels, 
containing a 4% polyacrylamide stacker, can be used) The dried gels are visualized by 
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autoradiography or phosphorimaging and the apparent Kd is determined by calculating 
the protein concentration that gives half-maximal binding. 

The assays can also include determining active fractions in the protein 
preparations. Active fractions are determined by stoichiometric gel shifts where proteins 
are titrated against a high concentration of target DNA. Titrations are done at 100, 50, 
and 25% of target (usually at micromolar levels). 

V. Applications of Engineered Zinc Finger Proteins 

ZPFs that bind to a particular target gene, and the nucleic acids encoding them, 
can be used for a variety of applications. These applications include therapeutic methods 
in which a ZFP or a nucleic acid encoding it is administered to a subject and used to 
modulate the expression of a target gene within the subject. See, for example, co-owned 
WO 00/41566. The modulation can be in the form of repression, for example, when the 
target gene resides in a pathological infecting microrganisms, or in an endogenous gene 
of the patient, such as an oncogene or viral receptor, that is contributing to a disease state. 
Alternatively, the modulation can be in the form of activation when activation of 
expression or increased expression of an endogenous cellular gene can ameliorate a 
diseased state. For such applications, ZFPs, or more typically, nucleic acids encoding 
them are formulated with a pharmaceutically acceptable carrier as a pharmaceutical 
composition. 

Pharmaceutically acceptable carriers are determined in part by the particular 
composition being administered, as well as by the particular method used to administer 
the composition, (see, e.g., Remington's Pharmaceutical Sciences, 17 ed. 1985)). The 
ZFPs, alone or in combination with other suitable components, can be made into aerosol 
formulations (i.e., they can be "nebulized") to be administered via inhalation. Aerosol 
formulations can be placed into pressurized acceptable propellants, such as 
dichlorodifluoromethane, propane, nitrogen, and the like. Formulations suitable for 
parenteral administration, such as, for example, by intravenous, intramuscular, 
intradermal, and subcutaneous routes, include aqueous and non-aqueous, isotonic sterile 
injection solutions, which can contain antioxidants, buffers, bacteriostats, and solutes that 
render the formulation isotonic with the blood of the intended recipient, and aqueous and 
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non-aqueous sterile suspensions that can include suspending agents, solubilizers, 
thickening agents, stabilizers, and preservatives. Compositions can be administered, for 
example, by intravenous infusion, orally, topically, intraperitoneally, intravesically or 
intrathecally. The formulations of compounds can be presented in unit-dose or multi- 
dose sealed containers, such as ampules and vials. Injection solutions and suspensions 
can be prepared from sterile powders, granules, and tablets of the kind previously 
described. 

The dose administered to a patient should be sufficient to effect a beneficial 
therapeutic response in the patient over time. The dose is determined by the efficacy and 
K<3 of the particular ZFP employed, the target cell, and the condition of the patient, as 
well as the body weight or surface area of the patient to be treated. The size of the dose 
also is determined by the existence, nature, and extent of any adverse side-effects that 
accompany the administration of a particular compound or vector in a particular patient 

In other applications, ZFPs are used in diagnostic methods for sequence specific 
detection of target nucleic acid in a sample. For example, ZFPs can be used to detect 
variant alleles associated with a disease or phenotype in patient samples. As an example, 
ZFPs can be used to detect the presence of particular mRNA species or cDNA in a 
complex mixtures of mRNAs or cDNAs. As a further example, ZFPs can be used to 
quantify copy number of a gene in a sample. For example, detection of loss of one copy 
of a p53 gene in a clinical sample is an indicator of susceptibility to cancer. In a further 
example, ZFPs are used to detect the presence of pathological microorganisms in clinical 
samples. This is achieved by using one or more ZFPs specific to genes within the 
microorganism to be detected. A suitable format for performing diagnostic assays 
employs ZFPs linked to a domain that allows immobilization of the ZFP on an ELIS A 
plate. The immobilized ZFP is contacted with a sample suspected of containing a target 
nucleic acid under conditions in which binding can occur. Typically, nucleic acids in the 
sample are labeled (e.g., in the course of PCR amplification). Alternatively, unlabelled 
probes can be detected using a second labelled probe. After washing, bound-labelled 
nucleic acids are detected. 

ZFPs also can be used for assays to determine the phenotype and function of gene 
expression. Current methodologies for determination of gene function rely primarily 



28 



8325-00011.20 
S11-US2 

upon either overexpression or removing (knocking out completely) the gene of interest 
from its natural biological setting and observing the effects. The phenotypic effects 
observed indicate the role of the gene in the biological system. 

One advantage of ZFP-mediated regulation of a gene relative to conventional 
knockout analysis is that expression of the ZFP can be placed under small molecule 
control. By controlling expression levels of the ZFPs, one can in turn control the 
expression levels of a gene regulated by the ZFP to determine what degree of repression 
or stimulation of expression is required to achieve a given phenotypic or biochemical 
effect. This approach has particular value for drug development. By putting the ZFP 
under small molecule control, problems of embryonic lethality and developmental 
compensation can be avoided by switching on the ZFP repressor at a later stage in mouse 
development and observing the effects in the adult animal. Transgenic mice having 
target genes regulated by a ZFP can be produced by integration of the nucleic acid 
encoding the ZFP at any site in trans to the target gene. Accordingly, homologous 
recombination is not required for integration of the nucleic acid. Further, because the 
ZFP is trans-dominant, only one chromosomal copy is needed and therefore functional 
knock-out animals can be produced without backcrossing. 

All references cited above are hereby incorporated by reference in their entirety 
for all purposes. 

EXAMPLES 

Example 1 : Initial design of zinc finger proteins and determination of 
binding affinity 

Initial ZFP designs were based on existing design rules, correspondence regimes 
and ZFP directories, including those disclosed herein {see Tables 1-5) and also in 
WO 98/53058; WO 98/530059; WO 98/53060 and co-owned US patent application 
Serial No. 09/444,241. See also WO 00/42219. Amino acid sequences were 
conceptually designed using amino acids 532-624 of the human transcription factor Spl 
as a backbone. Polynucleotides encoding designed ZFPs were assembled using a 
Polymerase Chain Reaction (PCR)-based procedure that utilizes six overlapping 
oligonucleotides. PCR products were directly cloned cloning into the Tac promoter 
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vector, pMal-c2 (New England Biolabs, Beverly, MA) using the Kpnl and BamHI 
restriction sites. The encoded maltose binding protein-ZFP fusion polypeptides were 
purified according to the manufacturer's procedures (New England Biolabs, Beverly, 
MA). Binding affinity was measured by gel mobility-shift analysis. All of these 
procedures are described in detail in co-owned WO 00/41566 and WO 00/42219, as well 
as in Zhang et al (2000) J. Biol. Chern. 275:33,850-33,860 and Liu et al (2001) </. Biol 
Chem. 276:1 1,323-1 1,334; the disclosures of which are hereby incorporated by reference 
in their entireties. 

Example 2: Optimization of binding specificity by site selection 

Designed ZFPs were tested for binding specificity using site selection methods 
disclosed in parent application USSN 09/716,637. Briefly, designed proteins were 
incubated with a population of labeled, double-stranded oligonucleotides comprising a 
library of all possible 9- or 10-nucleotide target sequences. Five nanomoles of labeled 
oligonucleotides were incubated with protein, at a protein concentration 4-fold above its 
K<i for its target sequence. The mixture was subjected to gel electrophoresis, and bound 
oligonucleotides were identified by mobility shift, and extracted from the gel. The 
purified bound oligonucleotides were amplified, and the amplification products were 
used for a subsequent round of selection. At each round of selection, the protein 
concentration was decreased by 2 fold. After 3-5 rounds of selection, amplification 
products were cloned into the TOPO TA cloning vector (Invitrogen, Carlsbad, CA), and 
the nucleotide sequences of approximately 20 clones were determined. The identities of 
the target sites bound by a designed protein were determined from the sequences and 
expressed as a compilation of subsite binding sequences. 

Example 3: Comparison of site selection results with binding affinity 

To test the correlation between site selection results and the affinity of binding of 
a ZFP to various related targets, site selection experiments were conducted on 2 three- 
finger ZFPs, denoted ZFP1 and ZFP2, and the site selection results were compared with 
Kd measurements obtained from quantitative gel-mobility shift assays using the same 
ZFPs and target sites. Each ZFP was constructed, based on design rules, to bind to a 
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particular nine-nucleotide target sequence (comprising 3 three-nucleotide subsites), as 
shown in Figure 1 . Site selection results and affinity measurements are also shown in 
Figure 1. The site selection results showed that fingers 1 and 3 of both the ZFP1 and 
ZFP2 proteins preferentially selected their intended target sequences. However, the 
second finger of each ZFP preferentially selected subsites other than those to which they 
were designed to bind {e.g., F2 of ZFP1 was designed to bind TCG, but preferentially 
selected GTG; F2 of ZFP2 was designed to bind GGT, but preferentially selected GGA). 

To confirm the site selection results, binding affinities of ZFP 1 and ZFP2 were 
measured (see Example 1, supra), both to their original target sequences and to new 
target sequences reflecting the site selection results. For example, the Mt-1 sequence 
contains two base changes (compared to the original target sequence for ZFP1) which 
result in a change in the sequence of the finger 2 subsite to GTG, reflecting the preferred 
finger 2 subsite sequence obtained by site selection. In agreement with the site selection 
results, binding of ZFP 1 to the Mt-1 sequence is approximately 4-fold stronger than its 
binding to the original target sequence (K<j of 12.5 nM compared to a K<j of 50 nM, see 
Figure 1). 

For ZFP2, the specificity of finger 2 for thQ 3' base of its target subsite was tested, 
since, although this finger was designed to bind GGT, site selection indicated that it 
bound preferentially to GGA. Moreover, the site selection results predicted that finger 2 
of ZFP2 would bind with approximately equal affinity to GGT and GGC. Accordingly, 
target sequences containing GGA (Mt-3) and GGC (Mt-4) at the finger 2 subsite were 
constructed, and binding affinities of ZFP2 to these target sequences, and to its original 
target sequence (containing GGT at the finger 2 subsite), were compared. In complete 
agreement with the site selection results, ZFP2 exhibited the strongest binding affinity for 
the target sequence containing GGA at the finger 2 subsite (K<j of 0.5 nM, Figure 1), and 
its affinity for target sequences containing either GGT or GGC at the finger 2 subsite was 
approximately equal (K^ of 1 nM for both targets, Figure 1). Accordingly, the site 
selection method, in addition to being useful for iterative optimization of binding 
specificity, can also be used as a useful indicator of binding affinity. 
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Example 4: Use of site selection to identify position-dependent, GNN-binding 
zinc fingers 

A large number of engineered ZFPs have been evaluated, by site selection, to 
identify zinc fingers that bind to GNN target subsites. In the course of these studies, it 
5 became apparent that the binding specificity of a particular zinc finger sequence is, in 
some instances, dependent upon the position of the zinc finger in the protein, and hence 
upon the location of the target subsite within the target sequence. For example, if one 
wishes to design a three-finger zinc finger protein to bind to a target sequence containing 
the triplet subsite GAT, it is necessary to know whether this subsite is the first, second or 
1 0 third subsite in the target sequence (i e. , whether the GAT subsite will be bound by the 
first, second or third finger of the protein). Accordingly, over 1 1 0 three-finger zinc 
if finger proteins, containing potential GNN-recognizing zinc fingers in various locations, 

Oj have been evaluated by site selection experiments. Generally, several zinc finger 
% sequences were designed to recognize each GNN triplet, and each design was tested in 

© 15 each of the Fl , F2 and F3 positions through 4 to 6 rounds of selection. 
H ;! The results of these analyses, shown in Figure 2, provide optimal position- 

fT dependent zinc finger sequences (the sequences shown represent amino acid residues -1 

fU through +6 of the recognition helix portion of the finger) for recognition of the 1 6 GNN 

S target subsites, as well as site selection results for these GNN-specific zinc fingers. 

^ 20 Optimal amino acid sequences for recognition of each GNN subsite from each of three 
positions (finger 1, finger 2 or finger 3) are thereby provided. 
GNG-binding finger designs 

The amino acid sequence RSDXLXR (position -1 to +6 of the recognition helix) 
was found to be optimal for binding to the four GNG triplets, with Asn +3 specifying A as 

25 the middle nucleotide; His +3 specifying G as the middle nucleotide; Ala 4 " 3 specifying T as 
the middle nucleotide; and Asp +3 specifying cytosine as the middle nucleotide. At the +5 
position, Ala, Thr, Ser, and Gin, were tested, and all showed similar specificity profiles 
by site selection. Interestingly, and in contrast to a previous report (Swirnoff et ah (1995) 
Mol Cell Biol 15:2275-2287), site selection results indicated that three naturally- 

30 occurring GCG-binding fingers from zif268 and Spl, having the amino acid sequences 
RSDELTR, RSDELQR, and RSDERKR, were not GCG-specific. Rather, each of these 
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fingers selected almost equal numbers of GCG and GTG sequences. Analysis of binding 
affinity by gel-shift experiments confirmed that finger 3 of zi£268, having the sequence 
RSDERKR, binds GCG and GTG with approximately equal affinity. 

Position devendence of GCA-. GAT-, GGT-. GAA- and GCC-bindinz fingers 
Based on existing design rules, the amino acid sequence QSGDLTR (-1 through 
+6) was tested for its ability to bind the GCA triplet from three positions (Fl, F2, and F3) 
within a three-finger ZFP. Figure 3A shows that the QSGDLTR sequence bound 
preferentially to the GCA triplet subsite from the F2 and F3 positions, but not from Fl. 
In fact, the presence of QSGDLTR at the Fl position of three different three-finger ZFPs 
resulted predominantly in selection of GCT. Accordingly, an attempt was made to 
redesign this sequence to obtain specificity for GCA from the Fl position. Since the 
sequence Q^G^S^R* 6 had previously been selected from a randomized Fl library using 
GCA as target (Rebar et al (1994) Science 263:671-673), a D (asp) to S (ser) change was 
made at the +3 residue of this finger. The resulting sequence, QSGSLTR, was tested for 
its binding specificity by site selection and found to preferentially bind GCA, from the Fl 
position, in three different ZFPs (see Figure 2). 

The QSGSLTR zinc finger, optimized for recognition of the GCA subsite from 
the Fl position, was tested for its selectivity when located at the F2 position. 
Accordingly, two ZFPs, one containing QSGSLTR at finger 2 and one containing 
QSGDLTR at finger 2 (both having identical Fl sequences and identical F3 sequences) 
were tested by site selection. The results indicated that, when used at the F2 position, 
QSGSLTR bound preferentially to GTA, rather than GCA. Thus, for optimal binding of 
a GCA triplet subsite from the Fl position, the amino acid sequence QSGSLTR is 
required; while, for optimal binding of the same subsite sequence from F2 or F3, 
QSGDLTR should be used. Accordingly, different zinc finger amino acid sequences may 
be needed to specify a particular triplet subsite sequence, depending upon the location of 
the subsite within the target sequence and, hence, upon the position of the finger in the 
protein. 

Positional effects were also observed for zinc fingers recognizing GAT and GGT 
subsites. The zinc finger amino acid sequence QSSNLAR (-1 through +6) is expected to 
bind to GAT, based on design rules. However, this sequence selected GAT only from the 
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Fl position, and not from the F2 and F3 positions, from which the sequence GAA was 
preferentially bound (Figure 3B). Similarly, the amino acid sequence QSSHLTR which, 
based on design rules, should bind GGT, selected GGT at the Fl position, but not at the 
F2 and F3 positions, from which it preferentially bound GGA (Figure 3C). Conversely, 
the amino acid sequence TSGHLVR has previously been disclosed to recognize the 
triplet GGT, based on its selection from a randomized library of zi£268 finger 2. U.S. 
Patent No. 6,140,081 . However, TSGHLVR was not specific for the GGT subsite when 
located at the Fl position (Figure 3C). These results indicate that the binding specificity 
of many fingers is position dependent, and particularly point out that the sequence 
specificity of a zinc finger selected from a F2 library may be positionally limited. 

The results shown in Figure 2 indicate that recognition of at least GAA and GCC 
triplets by zinc fingers is also position dependent. 

These positional dependences stand in contrast to earlier published work, which 
suggested that zinc fingers behaved as independent modules with respect to the sequence 
specificity of their binding to DNA. Desjarlais et al (1993) Proc. Natl. Acad Set USA 
90:2256-2260. 

Example 5: Characterization of EP2C 

The engineered zinc finger protein EP2C binds to a target sequence, 
GCGGTGGCT with a dissociation constant (Ka) of 2 nM. Site selection results indicated 
that fingers 1 and 2 are highly specific for their target subsites, while finger 3 selects 
GCG (its intended target subsite) and GTG at approximately equal frequencies 
(Figure 4A). To confirm these observations, the binding affinities of EP2C to its cognate 
target sequence, and to variant target sequences, was measured by standard gel-shift 
analyses (see Example 1, supra). As standards for comparison, the binding affinities of 
Spl and zif268 to their respective targets were also measured under the same conditions, 
and were determined to be 40 nM for SP1 (target sequence GGGGCGGGG) and 2 nM 
for zif268 (target sequence GCGTGGGCG). Measurements of binding affinities 
confirmed that F3 of EP2C bound GTG and GCG equally well (KdS of 2 nM), but bound 
GAG with a two-fold lower affinity (Figure 4B). Finger 2 was very specific for the GTG 
triplet, binding 15-fold less tightly to a GGG triplet (compare 2C0 and 2C3 in Figure 4B). 
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Finger 1 was also very specific for the GCT triplet, it bound with 4-fold lower affinity to 
a GAT triplet (2C4) and with 2-fold lower affinity to a GCG triplet (2C5). This example 
shows, once again, the high degree of correlation between site selection results and 
binding affinities. 

Example 6: Evaluation of engineered ZFPs by in vivo functional assays 

To determine whether a correlation exists between the binding affinity of a 
engineered ZFP to its target sequence and its functionality in vivo, cell-based reporter 
gene assays were used to analyze the functional properties of the engineered ZFP EP2C 
(see Example 5, supra). For these assays, a plasmid encoding the EP2C ZFP, fused to a 
VP 16 transcriptional activation domain, was used to construct a stable cell line (T-Rex- 
293™, Invitrogen, Carlsbad, CA) in which expression of EP2C-VP16 is inducible, as 
described in Zhang et al, supra. To generate reporter constructs, three tandem copies of 
the EP2C target site, or its variants (see Figure 4B, column 2), were inserted between the 
Mlu I and BgUI sites of the pGL3 luciferase-encoding vector (Promega, Madison, WI), 
upstream of the S V40 promoter. Structures of all reporter constructs were confirmed by 
DNA sequencing. 

Luciferase reporter assays were performed by co-transfection of luciferase 
reporter construct (200 ng) and pCMV- pgal (100 ng, used as an internal control) into the 
EP2C cells seeded in 6-well plates. Expression of the EP2C-VP16 transcriptional 
activator was induced with doxycycline (0.05 ug/ml) 24 h after transfection of reporter 
constructs. Cell lysates were harvested 40 hours post-transfection, luciferase and 
p-galactosidase activities were measured by the Dual-Light Reporter Assay System 
(Tropix, Bedford, MA), and luciferase activities were normalized to the co-transfected (3- 
galactosidase activities. The results, shown on the right side of Figure 4B, showed that 
the normalized luciferase activity for each reporter construct was well correlated with the 
in vitro binding affinity of EP2C to the target sequence present in the construct. For 
example, the target sequences to which EP2C bound with greatest affinity (2C0 and 2C2, 
of 2 nM for each) both stimulated the highest levels of luciferase activity, when used 
to drive luciferase expression in the reporter construct (Figure 4B). Target sequences to 
which EP2C bound with 2-fold lower affinity, 2C1 and 2C5 (K^ of 4 nM for each), 
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stimulated roughly half the luciferase activity of the 2C0 and 2C2 targets. The 2C3 and 
2C4 sequences, for which EP2C showed the lowest in vitro binding affinities, also 
yielded the lowest levels of in vivo activity when used to drive luciferase expression. 
Target 3B, a sequence to which EP2C does not bind, yielded background levels of 
luciferase activity, similar to those obtained with a luciferase-encoding vector lacking 
EP2C target sequences (pGL3). Thus there exist good correlations between binding 
affinity (as determined by K<i measurement), binding specificity (as determined by site 
selection) and in vivo functionality for engineered zinc finger proteins. 
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XVv_7X>X X J [XVJ— s > > 


EROHLAT 3 61 

JL_IJLVV^X±J Ln. J. _ J W __ 


1000 

_L. w w \J 


14ft 


TGCGGGGCAA 44 


OSGSLTR 

V^y O VJ O XJ X IV 


1 50 


RGDHLKD 2 56 


ERDHLRT 3 62 


1000 

J- \J \J \J 


1 £7 
,3 0/ 


GGGGGCGGG 


45 


xvkjxjxxxj x xv 


1 SI 

J )X 


D^QHLTR 2 57 

LJ O VUX XXJ X XV ' / 


RSDHLOR 3 61 


60 


^ £T Q 


GAGGGGGCG 


46 


PSnPT.TP 

IvOXJIjXJ X XV 


1 59 


XvOX^XXXJ X XV z. JO 


PCJDNT.TP 164 

XV 0 1-/XM XJ X XV J U 


1 5 


"3 £ Q 

O D Z? 


GTAGTTGTG 


47 


PQDZiT.TP 

X\.DL/riiJ X Xv 


i m 
i j j 


TfldqT.AP 9SQ 

XwOOXxrixv Zi ZJ y 


n^n^LTP i^5 




17 0 


GTAGTTGTG 


48 


PSDAT.TP 

rLULiriJLi x xv 


1 54 

X 


NRATLAR 260 

X\l XVii. X XXC1XV j£j VJ V 


QSASLTR 3 66 

V> Uxik' X _i X XV JUU 


300 

J u u 


171 

O / X 


GTAGTTGTG 


49 


RSDALTR 

XV L/iiJ — 1 X IN- 


155 


NRATLAR 261 


OSGSLTR 3 67 


175 


179 


GTAGTTGTG 


50 


RSDSLLR 

IV kJ J_> k_? J — 1 -! — IX Y 


156 

X. *J V 


TGGSLAR 2 62 


OSASLTR 3 68 


112 5 


171 


GTAGTTGTG 


51 


PQDST.T.R 

Xv o u O XJ XJ XV 


1 57 
i j / 


NPATTiAR 9 61 


OSASTiTR 169 

v,^ kjnu j_j x iv «j vj _x 


120 

»J £a \J 


"5 *7 A 


GCTGAGGAA 


52 


OP CMT .VP 
i^xvoxSIxj V XV 


1 5ft 


PqDTJT.TP 9^4 
XvOxJINXJ X Xv ZOt: 


"TCcrrTtDP 170 

X O OIZjXJVj^XV O / 


1 1 

-D.J) 


o "7 ir 


GAGGAAGAT 


53 


SkISkJ oiNI xxHxv 


ICQ 
X D _? 


^OvjjiVlXJ^Xv ZDj 


P CITTMT .TP 171 
xv O xJl>J Xj X xv / X 


ft 5 


*± u x 


GTAGTTGTG 


54 


P QDZaT.TP 

XVOiJi-iXJ X XV 


i £0 


Tnn^T.AP 9^^ 

X vJVjrOXJr_.Xv __ D D 


D^A^T.TP 179 

V^OiMOXJXXv O / _- 


ft 0 


4fc U J 


GTAGTTGTG 


55 


T? QDQT.T.P 

XvOXJOXjXjXV 


1 £1 
X D X 


"KTPZXTT.AP 9^7 


Ocinc!T,TP 171 


7 5 0 


*± ^ X 


GTAGTTGTG 


56 


I )r~i\ Jti i h i r\ 


1 £9 


TH^IQT.AP 9 fift 
J.uuOJ_r_A __ D O 


n c ?r^QT 1 r pp 174 

yO\JO XJ X Xv O 1 *± 


500 

*J V/ \J 


A O O 
*± Z Z 


GTAGTTGTG 


57 


P SDST J iP 
XvDXJD 1 1 1 IXv 


1 £1 

ID J 


X OvJJO U X XV Zi u y 


O^G^LTP 175 


9 00 

<i U U 




GTAGTTGTG 


58 


P QDAT.TP 

XVOXJJ-iXJ X XV 


1 £4 

X o *± 


TdfiClT.AP 970 

X VJVJOXXrtJtv __ / V/ 


OPQAT.AP 17^ 


1 000 


A O /l 


GATGCTGAG 


59 


P CTii\TT .TP 
xvOUIMXj X xv 


1 £^ 


TQC-pT.piP 971 

XOOXjXJ^Xv __ / X 


TQANT.CiP 177 

X O-rtXNXJOXV Off 


i no 

X VJ u 


/IOC 


GATGCTGAG 


60 


P QD"KTT.TP 

xvDxJXnIXj X xv 


1 

luu 


nQQDT.nP 9 79 


Onci'NTT.AP 17ft 

^v^oXNXXriXv -J / O 


9 5 


A O ^ 


GATGCTGAG 


61 


PQDMT.TP 
xvOxJInXj X xv 


1 £7 
ID / 


OQQDT.OP 971 
OxJxjv^xv __ / o 


TCIAMT.CJP 17Q 

X OinXM XJ O xv Jj / y 


5 5 


A O 7 


GCTGAGGAA 


62 


OP QNtT ATP 

XV O IN U V XV 


1 £ft 
X D O 


PCJDTJT.TP 9 74 

XvOX-/XN XJ X Xv __ / *± 


O^ciDT.nP IftO 


1 

X 


A O Q 


GAAGATGAC 


63 


DQQMT.TP 

XJO OXMXj X XV 


1 £Q 

X O _7 


nnQ'MTiAP 97 5 

V^>^OX\IXJJr_xv Zt I ZD 


DP CIMT A7P 1 ft 1 

V^XVOXM XJ V XV O O X 


1 90 

X VJ 


A O Q 




£ A 


U O OXM XJ X Xv 


1 7(1 
X / u 


TQANT.ciP 9 7^ 

X O-^XMXJOXv __ / o 


OP CIMT .VP 1 ft 9 

v^/XVOXMXJ V XV JO<6 


50 

J VJ 


a ^ n 


GATGACGAC 


65 


XL xVriX\ XJ X Xv 


1 71 

X / X 


DQQTvTT.TP 9 77 
UO OXMXJ X Xv z / / 


nnQ'KrT.AP ifti 

^^OXNlXJrtXv OOO 


9 5 0 




GACGACGGC 


66 


XJovxtlXj Xxv 


1 to 

X /z 


UKoJNXjxIjK A to 


JJ O OVi Xj X xv jO*t 


inn 


4t o z 


GACGACGGC 


67 


"PiCPTJT TP 
JJ O vJxTXj X XV 


X / O 


iJxrLHXNXj^-lxv __ / _7 


XJOOIVIxjXxv JOJ 


1UUU 


433 


GACGACGGC 


68 


DSGNLTR 


174 


DHANLAR 2 80 


DSSNLTR 386 


1000 


434 


GACGGCGTA 


69 


QSASLTR 


175 


DSGHLTR 2 81 


EKANLTR 3 87 


152 .5 


435 


GACGGCGTA 


70 


QSASLTR 


176 


DSGHLTR 282 


ERGNLTR 3 88 


150 


436 


GACGGCGTA 


71 


QRSALAR 


177 


DSGHLTR 283 


EKANLTR 3 89 


95 



38 



4 3 7 


ri7\ GTA 7 2 


ORSALAR 


178 


4*} ft 

T O O 




RSDELTR 

IV U X_l J_I -L -LV 


179 


4 4 0 

"± "± U 


nppGAGGTGf 1 74 


RSDSLLR 


180 

JL VJ V 


441 


GGTGGAGTCA 75 


DSGSLTR 


181 


44 S 


gtpgpagtga 7 6 

VJT X \_*VjV_JT,\J -L VJ.TT. / \J 


RSDSLRR 


182 

X. U u 


4 r n 


f^APTTPftTfin 7 7 


RCiDTT.AR 

IV J_> X X_Lrt.XV 


183 

J- VJ —J 


*± D J 




DP^AT.AR 


1 R4 

J_ U t: 


4 £ 1 


f^Af^TAPT(TTA 7 9 


^JA.kJl±J_J X X 


185 

X U J 


4£3 

t: O j 


PTPPAGHAGA RO 


RSDMLTR 


186 

X U v 


4 £4 


nTGGAnnAGA r i 


RSDNLTR 

IVkJ L/l« XJ X XV 


187 


4 

*x O O 


PAnfirTGCGC R2 


RSDDLTR 


188 

X u u 


4£7 


PAGOPTGrGr R3 


RSDELTR 


189 

-L 


4 £ ft 
*± D O 


V^.rt.O\Jv- X Vjv-VJV.* o *± 


RuLVL/JJ X XV 


1 90 

X ^ \J 


A £T Q 

*± O _7 


paahapptpt ftR 


U Xv O jTlXxro-iv 


1 91 

x?x 


4 79 


f^APfJTPTfiGA ftfi 


R^^TTT.TT 

IVkJlJlXiJl X 


192 

X, ~f 


4 7 £ 


f^Af^AnniATG R7 
vj vjrt. vjrt. vjt \jrrt. iu o / 


TT^NT .PR 

X X kJXvXJXVXv 


1 93 


Ann 


rznanarinaTn ft ft 

OvJ.rt.vxrt.vJvXrt. X vJ O O 


X X OXNXJXvXv 


1 94 
j / ^ 


4 7ft 


f^PAHAPnATG ftQ 

vJ vJ-rt.vXrt.vJ \JJr\ X VJ O -7 


TTSNT iRR 

X X uJ>X\J XJXVXV 


1 95 


47 Q 


vj X vjvj^VjvxtIv^ v^ -7 W 


LJ O \D1M XJ X XV 


1 9fi 


4 ft n 


nTPrtpnPAPP qi 

vj X vJvJv^vjvJirlA_»v_, y X 


LJ O OXN XJ X Xv 


1 97 


/I QO 


vxH.vj vJ vJ\_ vJrti-lvJ _7 ^ 




1 Qft 


4 ft4 


vjjrt-vJv^vj v^ vs.rtjrt.vx .? — > 




1 99 


4 ft R 
f± O D 


Pf2Af2AnniTTT Q4 

VjV3rt.VXrt.VJVJ X XX 


OCJ^AT.AR 

O OjraXXrt.lv 


2 00 

W V/ 


A ft "7 


vxvXrt.vJrt.vJvJ XXX y -D 


"NTR ATT.AR 

Vi XV£*L X XXrt.IV 


9 m 


A ft ft 


X vjvj X i^vJ vjvjvJvj D 


iv o xj n xjjrtxv 


9 09 


^ q r» 


n-i7\ /-ipririprppri qh 

X.A.vXvJvJvJvxX UU 1? / 


XvOxJOXjXjIv 


Z VJ o 


c; n ^ 


vJv-v^vJrt.vJvJ± vj\_» -7 O 


ivO L> O XjXJXV 


9 04 
u ^± 


504 


GCCGAGGTGC 99 


RSDSLLR 


205 


505 


GCCGAGGTGC 10 0 


RSDSLLR 


206 


526 


GCGGGCGGGC 101 


RSDHLTR 


207 


543 


GAGTGTGTGA 102 


RSDLLQR 


208 
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DSGHLTR 2 84 ERGNLTR 390 117.5 

RSDHLTT 285 RSDNLTR 3 91 62.5 

RSKNLQR 286 ERGTLAR 392 40 

QSGHLQR 2 87 TSGHLTR 3 93 250 

QSSDLQK 288 DSGSLTR 3 94 1000 

RGDALTS 289 DRSNLTR 395 130 

QSGHLQR 290 DSSKLSR 396 150 

DRSNLRT 2 91 RSDNLAR 3 97 120 

RSDNLAR 292 RSDALAR 3 98 0.5 

RSDNLAR 293 RSDSLAR 399 0.4 

QSSDLQR 294 RSDNLRE 4 00 65 

QSSDLQR 295 RGDHLKD 401 800 

QSSDLQR 296 RGDHLKD 402 42 

RSDNLAR 297 QSGNLTR 4 03 13.5 

DRSALAR 298 RSDNLAR 4 04 80 

RSDNLAR 299 QSDHLTR 4 05 80 

RSDNLAR 300 QRAHLAR 406 100 

RSDNLAR 301 QSGHLRR 4 07 60 

RSDELQR 302 RSDALAR 4 08 8.5 

RADTLRR 303 RSDALAR 409 5 

ESSKLKR 3 04 RSDNLAR 410 130 

ESSKLKR 305 RSDNLAR 411 1000 

RSDNLAR 306 QRAHLAR 412 110 

RSDNLAR 307 QSGHLAR 413 76.9 

RSDNLTT 308 RSDHLTT 414 35 

RSDHLTR 30 9 RSDNLTT 415 1.5 

RSDNLAR 310 ERGTLAR 416 50 

RSDNLAR 311 DRSDLTR 417 2 5 

RSDNLAR 312 DCRDLAR 418 65 

ERGHLTR 313 RSDTLKK 419 8 

MSHHLKE 314 RSDHLSR 420 50 



39 



544 GAGTGTGTGA 103 RSDSLLR 209 

545 GAGTGTGTGA 104 RKDSLVR 210 

546 GAGTGTGTGA 105 RSDLLQR 211 
54 7 GAGTGTGTGA 106 RKDSLVR 212 
54 8 GAGTGTGTGA 107 RSSLLQR 213 

549 GAGTGTGTGA 108 RSSLLQR 214 

550 GAGTGTGTGA 109 RKDSLVR 215 

551 GAGTGTGTGA 110 RSDLLQR 216 

552 GAGTGTGTGA 111 RKDSLVR 217 

553 GAGTGTGTGA 112 RSDSLLR 218 

554 GAGTGTGTGA 113 RKDSLVR 219 

558 TGCGGGGCA 114 QSGDLTR 22 0 

559 GAGTGTGTGA 115 RSDSLLR 221 

560 GAGTGTGTGA 116 RSSLLQR 222 

561 GAGTGTGTGA 117 RKDSLVR 223 

562 GAGTGTGTGA 118 RSDSLLR 224 
565 GATGCTGAG 119 RSDNLTR 225 

567 GAAGATGAC 120 EKANLTR 226 

568 GATGACGAC 121 EKANLTR 227 

569 GTAGTTGTG 122 RSDSLLR 22 8 
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MSHHLKE 315 


RSDNLAR 


421 


125 


TSDHLAS 316 


RSDNLTR 


422 


32 


MSHHLKT 317 


RLDGLRT 


423 


500 


TSGHLTS 318 


RSDNLTR 


424 


500 


MSHHLKT 319 


RSDHLSR 


425 


500 


MSHHLKE 320 


RSDHLSR 


426 


500 


TKDHLAS 321 


RSDNLTR 


427 


20 


MSHHLKT 322 


RSDHLSR 


428 


50 


MSHHLKT 323 


RSDNLTR 


429 


31 


MSHHLKE 324 


RSDNLTR 


430 


125 


TSDHLAS 325 


RSDNLAR 


431 


62 .5 


RSDHLTR 326 


DSGHLAS 


432 


21 


TSDHLAS 327 


RSDNLAR 


433 


1000 


MSHHLKT 328 


RSDHLSR 


434 


500 


MSHHLKE 32 9 


RSDNLAR 


435 


1000 


TSGHLTS 330 


RSDNLAR 


436 


1000 


TSSELQR 331 


QQSNLAR 437 


100 


TSANLSR 332 


QRSNLVR 438 


47.5 


DSSNLTR 333 


TSANLSR 


439 


300 


TGGSLAR 334 


QRSALTR 44 0 


52 



40 
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TABLE 2 









SEQ 


SEQ 


SEQ 


SEQ 


Kd 




SBS# 


TARGET 


ID 


Fl ID 


F2 ID 


F3 ID 


(nM) 




201 

^-i V/ -1- 


GCAGCCTTG 


441 


RSDSLTS 646 


ERSTLTR 851 


QRADLRR 1056 


1000 




202 


GCAGCCTTG 


442 


RSDSLTS 647 


ERSTLTR 852 


QRADLAR 1057 


1000 






GCAGCCTTG 


443 


RSDSLTS 648 


ERSTLTR 853 


QRATLRR 1058 


1000 




204 


GCAGCCTTG 


444 

x- jv 


RSDSLTS 649 


ERSTLTR 854 


QRATLAR 1059 


1000 




205 


GAGGTAGAA 


445 

x <x 


OSANLAR 650 


QSATLAR 855 


RSDNLSR 1060 


80 




206 


GAGGTAGAA 


446 


OSANLAR 651 


OSAVLAR 856 


RSDNLSR 1061 


1000 




207 


GAGTGGTTA 


447 

x x / 


ORASLAS 652 


RSDHLTT 857 


RSDNLAR 1062 


70 




Zi V o 


TAGGTCTTA 


448 


ORASLAS 653 


DRSALAR 858 


RSDNLAS 1063 


1000 




2 0 9 

\J zs 


GGAGTGGTT 

\J\JJrxK-J X VJVJ x X 


449 


OSSALAR 654 


RSDALAR 859 


ORAHLAR 1064 


35 


i: 


210 

^ _L W 


GGAGTGGTT 

\JJ VJ-ti-VJ X VJVJ X X 


450 


NRDTLAR 655 


RSDALAR 860 


ORAHLAR 1065 


65 




91 1 

Zj X. X 


GGAGTGGTT 

vj x x x 


451 


OSSALAR 656 


RSDALAS 861 


ORAHLAR 10 66 


140 




212 


GGAGTGGTT 

\J\_Jx^.v_J X V_J V_J X X 


452 


NRDTLAR 657 


RSDALAS 862 


ORAHLAR 1067 


400 




211 

X. J 


GTTGCTGGA 

Ul X n 1 V- ' X VJ VJXi 


453 


ORAHLAR 658 


OSSTLAR 863 


OSSALAR 1068 


1000 


iy 


914 


Or x A. uv^ x 


4 54 


OPAHLAR 6 59 


OSSTLAR 864 


NRDTLAR 1069 


1000 

•Jm \J v V 




9 1 R 




4 R R 
*± o 


J.\i XvxJ l±XJi T l v U U U 


DRSATtAR 8 65 


OSANLSR 107 0 

V^ 1 — 'iiiM J — 1 k_/ XV X. \J I \J 


1000 

X. \J \J \J 




9 1 £ 

Z ID 


Ha AfTTPTGT 

vjj-rt-rlor x x vjr x 


4 56 


NRDHLTT 661 

XN X\.XVXXXJ XX u u ± 


DRSALAR 866 


OSANLSR 1071 


1000 




91 7 




4 57 


OR S ALAR 662 


DRSALAR 867 


RSDNLAR 1072 


40 




91 Q 


vxtt. x or x x Out. x 


4 58 

T JO 


S^St tJJ-" xxcixv uu J 


NRDTLAR 8 68 


NRDNLSR 1073 

±M X VX-/XM XJ U ±\. X W / — ^ 


1000 

_L V/ W V 




9 9 n 

/1j Za KJ 


VJ.rt. X \ J X X VJ-T^. X 


4 5 9 


OOSNLAR 664 


NRDTLAR 869 


OOSNLSR 1074 


1000 

_L. \J w 




221 


GATGAGTAC 


460 


DRSNLRT 665 


RSDNLAR 870 


NRDNLAR 1075 


1000 




222 


GATGAGTAC 


461 


ERSNLRT 666 


RSDNLAR 871 


NRDNLAR 1076 


1000 




223 


GATGAGTAC 


462 


DRSNLRT 667 


RSDNLAR 872 


QQSNLAR 1077 


105 




224 


GATGAGTAC 


463 


ERSNLRT 668 


RSDNLAR 873 


QQSNLAR 1078 


1000 




225 


TGGGAGGTC 


464 


DRSALAR 669 


RSDNLAR 874 


RSDHLTT 1079 


6 




226 


GCAGCCTTG 


465 


RGDALTS 670 


ERGTLAR 875 


QSGSLTR 1080 


1000 




227 


GCAGCCTTG 


466 


RGDALTV 671 


ERGTLAR 876 


QSGSLTR 1081 


1000 



41 



228 GC AGCCTTG 467 RGDALTM 672 

22 9 GCAGCCTTG 468 RGDALTS 673 

23 0 GCAGCCTTG 469 RGDALTV 674 
231 GCAGCCTTG 470 RGDALTM 675 
2 32 GGTGTGGTG 471 RSDALTR 676 
233 GGTGTGGTG 472 RSDALTR 677 
23 5 GTAGAGGTG 473 RSDALTR 678 
2 3 6 GGGGAGGGG 474 RSDHLAR 679 
237 GGGGAGGCC 475 ERGTLAR 680 
23 8 GGGGAGGCC 476 ERGTLAR 681 
23 9 GGCGGGGAG 477 RSDNLTR 682 
240 GCAGGGGAG 4 78 RSDNLTR 683 

242 GGGGGTGCT 479 QSSDLRR 684 

243 GTGGGCGCT 480 QSSDLRR 685 

244 TAAGAAGGG 481 RSDHLAR 686 

245 TAAGAAGGG 4 82 RSDHLAR 687 

246 GAAGGGGAG 483 RSDNLAR 688 

247 GAAGGGGAG 4 84 RSDNLAR 689 
2 76 GCGGCCGCG 4 85 RSDELTR 690 

277 GCGGCCGCG 486 RSDELTR 691 

278 GCGGCCGCG 487 QSWELTR 692 

279 GCGGCCGCG 488 QSWELTR 693 

280 GCGGCCGCG 489 QSGSLTR 694 

281 GCGGCCGCG 490 QSGSLTR 695 
2 82 GCAGAAGTG 4 91 RGDALTR 696 

283 GCAGAAGTG 4 92 RSDALTR 697 

284 GCGGCCGCG 493 QSGSLTR 698 

285 TGTGCGGCC 494 ERGTLAR 699 
2 87 GCAGAAGCG 495 RGPDLAR 700 

288 GCAGAAGCG 4 96 RGPDLAR 701 

289 GCAGAAGCG 497 RGPDLAR 702 
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KPCTrTiAR ft 77 


OSRSIiTR 1082 


1000 

X \J \J \J 


"FPfTTTiAP 8 78 

XjIvvJ X xxrlxv O / O 


pqUPl.TP 10 8"} 

XvOX-'XjXJ X Xv x v U J 


1 000 

J- \J \J \J 


RPfTTTiAP 87 9 

I_iX\.Vj X XXTlXv (_> / ^/ 


RSDELTR 10 84 

XV k_J X-/ 1 X_l XJ XXV X U u ~ 


1000 

JL \J \J \J 


XllXVVJ X XJxaXV (J KJ \J 


RSDELTR 1085 

XVk_J X_/i_ JXJ X XV _1_ V VJ _/ 


1000 

X \J \J \J 


PQTiATiAP 881 

r\ t )f-\ 1 \f-\ i\ o O JL 


KTR^TTTAP i nfifi 

XN XV t_? XXXXTiXV 1UOU 


R 0 


PQTlAT.AP ftftO 
rs. o iJi"i xxrilv OOZ 


DAQPTT.AP 1 087 
vj /~\ n n i i/-a i\ 1UO / 


1 00 

_L \J u 


PQTTNfT.AP ftft'} 

IvO J_Jx\l xxri.lv OO J 


OPHALAP 1088 


80 


PQrYKTT.&P 8 8 A 

XV O UxN XXri.lv O O *± 


pcjtyht.cip i 08Q 

XvOX^XXXjOXv X VJ O ^7 


0 ^ 


K r-> 1 JIM 1 iHK OO J 


PCl'R'RT.ciP 1 090 

xvO i-zixXJ O XV 1U 


0 ^ 


PCJDNTiOP 886 


RSDHLSR 10 91 

XV kJ X^X XX-I UJ XV X. \J *J JL 


0 8 


XVO X-/XXXJ X XV O O / 


■np QTJT.AR 1 0 99 

JJlVul XJ-XciXV _L VJ 


0 4 


XV k_> X-^X 1 J_l xv uuu 


OSGSLTR 1093 


1 


nQCXJT,7Vp QOQ 
O OliXXrtXV OO J 


XVOXJXXXJOXV _L VJ y 1 


l 

X. 


RDCUT AD QQO 


xa. O JJ/lixriK JL U _? D 


7 


C\ Q HttfT TP QQ1 


nCnivTT.PT 1 AQ^ 
yoo±\IJ_i.Kx XU^D 


1 0 O 
1UU 


PtQaMT TP QQO 
\) o/lxN xj 1 JX o y A 


Pi c; fi'MT .P T 1 flQ7 


0 L R 


xvDiJxT 1 ,i/-ii\. 0_?0 


^ O vJXN XJ X xv 1U JO 


o 


xvOxJxiXJ/-ixv 0-/*± 


Vp/ O VJJlN XJ PLxv 1U J J 


O 


IliXVVJ 1 XXrirv O y D 


pQnpppfP iinn 

XVOX^XltXVXN-LV lXVU 


Q 0 


TlDCCT TD QQf 

JJKobxjlK OJO 


pcpippTfp 1 1 m 


1 07 


JbKoxlxfiK oy / 


pcnpptfp i i oo 




PiP C CT .TP QQQ 
UivO O -Li X xv O J? O 


pQ*nppTTP 1 1 


9 £ 0 

D U 


"FPfVTT.AP 899 

X_iivvJ X Xxrixv O Zs 


P^D'R'PK'P 1 1 04 

iXO X^X_iXVX\_L\. _l LUl 


1 fiO 

X O VJ 


tyrqqt.tp 900 

xJxvO O XJ x xv y u u 


PClFlPPTrP 1 1 OR 

XV O X^XjXVXVXV llu J 




r\CAMT TP Q01 


VO/ixJj_Lrixv JL J_ U D 


10 0 0 
1UUU 


PiQnTvTT.TP Q09 


HQr^QT.TP 1 1 07 


o 
z 


pcmiJT.TT 9 0*3 

xvDJJxxxj X 1 <?Uj 


pcmpPTTP 1108 

xvD xJxjxvxvxv 11UO 


1 000 

X VJ VJ u 


RSDELTR 904 


SRDHLQS 110 9 


1000 


QSANLTR 905 


QSGSLTR 1110 


1000 


QSANLTR 906 


QSGSLTR 1111 


1000 


QSGNLQR 907 


QSGSLTR 1112 


800 
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290 GCAGAAGCG 498 RSDELAR 703 
292 GCAGAAGCG 499 RSDELTR 704 
2 93 GTGTGCGGC 500 DRSHLTR 705 
2 96 TGCGCGGCC 501 ERGTLAR 706 

2 97 TGCGCGGCC 502 ERGTLAR 707 

298 GCTTAGGCA 503 QTGELRR 708 

299 GCTTAGGCA 504 QTSDLRR 709 

3 00 GCTTAGGCA 505 QTADLRR 710 
3 01 GCTTAGGCA 506 QSADLRR 711 

302 GCTTAGGCA 507 QSGSLTR 712 

303 GCTTAGGCA 508 QTGSLTR 713 
3 04 GCTTAGGCA 509 QTADLTR 714 
3 05 GCTTAGGCA 510 QTGDLTR 715 
3 06 GCTTAGGCA 511 QTASLTR 716 
3 07 GAAGAAGCG 512 RSDELRR 717 
308 GAAGAAGCG 513 RSDELRR 718 
3 09 GGAGATGCC 514 ERSDLRR 719 

310 GGAGATGCC 515 DRSDLTR 72 0 

311 GGAGATGCC 516 DRSTLTR 721 

312 GGAGATGCC 517 ERGTLAR 722 

313 GGAGATGCC 518 DRSDLTR 723 

314 GGAGATGCC 519 DRSSLTR 724 

315 GGAGATGCC 52 0 ERGTLAR 725 

316 GGAGATGCC 521 ERGTLAR 726 

318 TAGGAGATGC 522 RSDALTS 727 

319 GGGGAAGGG 523 KTSHLRA 72 8 

320 GGGGAAGGG 524 RSDHLTR 729 

321 GGCGGAGAT 52 5 TTSNLRR 73 0 

323 GGCGGAGAT 52 6 TTSNLRR 731 

324 GGCGGAGAT 527 TTSNLRR 732 

325 GTATCTGCT 528 NSSDLTR 733 
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OSANLOR 908 


OSADLAR 1113 


1000 


OSANLOR 909 


OSGSLTR 1114 


1000 


ERHSLOT 910 

X»J X VX X k/ XJ -U \J 


RSDALTR 1115 


320 


RSDELTR 911 

X i. k-V -U- ' XJ ul 1 -U X V -X» *X> 


DRDHLOS 1116 


1000 


RSDELRR 912 


DRSHLOT 1117 


500 


RSDNLOK 913 


TSGDLSR 1118 


4000 


RSDNLOK 914 


OSSDLOR 1119 

lw* JU^ 1 1 ^^X V *X- #X* *JU «^ 


4000 

X v v v 


RSDNLOR 915 


OSSDLSR 112 0 


400 


RSDNLOT 916 


OSSDLSR 1121 


350 


RSDNLOT 917 

J. V 1 > XVXM J — 1 X X / 


OSSDLSR 1122 

i^j i^j xy XJ i^J XV X X &-i 




RSDNLOT 918 


OSSDLSR 1123 

yuux/uuiv x X6 j 


135 
j — > * 


RSDNLOT 919 


OSSDLSR 1124 

^/ vj kJ xv xj kJ x\. x x 


23 0 

J V 


R^DNT.OT 99 0 


OSSDTiSR 1 1 9 S 


930 

Zj «_} Vf 


RSDNLOT 921 


OSSDLSR 1126 

\^ i^J X-/ XJ L_J XV J L ^ VJ 


280 

Zj U V 




OSHNTiSR 1197 




OSANLOR 92 3 


OSANLOR 112 8 


1000 

J. U U V 


OSSNLOR 994 


OSnHLSR 119 9 


4000 

V V V 


NRDNLOT 925 

J.>i X\.X>XM J_J>y/ X ' 


OSGHLSR 113 0 


1000 

J- V V/ V/ 


NRDNLOR 92 6 


OSGHLSR 1131 

>y/ LV VJXXXJ U XV X X -^J X 


170 

X / v~/ 


NRDNLOR 92 7 

XNxVX^'.LiiXJV^xV *s £j t 


OSGHLSR 1139 

V^ w vTi XXJ \J XV X X w> Zj 


2000 

£-M \J \J \J 


ORSNLOR 92 8 


OSGHLSR 113 3 

\£ VJXl XJ LJ XV _L X, «J J 


1000 

X V V \J 


nci^KrT.nR Q9Q 

^kj kJlN XJ^JA. ~s £a 


^OOXxxJOlV X X J *l 


117 R 


OSSNLOR 93 0 


OSGHLSR 113^ 

S^'*-' vJXXXJkJ XV XX ) ' 


96R 


ORDNLOR 931 


OSGHLSR 113 6 

V^l u V^XXXJ kJ XV X X —J \J 


3000 
\j \j \j 


XV O XJxM XXrt.XV ^ J o 


"RSDNT.AS 1137 


1 00 

X w w 


nQGKir.cjp qh 

ViO OIMxjOIv -? j j 


IvOxJIxxjOXv IIjO 


1 

1Z J 


rjciCTNTT.cjR Q14 

O VJ1M XJuJa. -? ._> ^± 


xvOx^XXJ_lkDxV J L Jj _7 


c 


QSGHLQR 93 5 


DRSHLTR 114 0 


200 


QSGHLQR 93 6 


DRDHLTR 1141 


600 


QSGHLQR 93 7 


DRDHLTR 1142 


200 


NSDVLTS 938 


QSDVLTR 1143 


1000 
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3 9 6 


vjlril Vw X VJ X -L 


529 


NSDALTR 734 


NSDVLTS 939 


797 


TPTGrTGGG 

J. V J. VJ V_ _L VJ\J\J 


530 


RSDHLTR 73 5 


NSADLTR 940 


7 9 ft 

JZ o 


TfTGTTGGG 

J.L-1 VJ X X vJvJvJ 


531 

~J _) X. 


RSDHLTR 73 6 

XV k_/ X-/X X J 1 X XV / *J VJ 


NSSALTS 941 


74 9 
o *± -? 


VJ \JT X VJ X V_Vj\_V_ 


539 


DPRDLAR 73 7 

l^J \ ■ xvx-*' xjjtu. V / — > J 


DSGSLTR 942 


O 3 U 


1 V- V- VxrlVJ VJ VJ ± 


R77 

Zj O -D 


T3GHT.TR 73 ft 

X O VJXlXJ X XV /JO 


RSDNLTR 94 3 

XV k_J X-/XN XJ X XV A ~J 


7 m 

O O X, 


VJ V_ X VJVJ lulU 


R74 

_> z> 


DSGST.TR 73 9 


TSGHLTR 944 

X w V3XXXJ X XV ^ " j: 




vJunuuuu x vJ 


535 

U J w) 


RSDSLLR 74 0 

XV kJ XV XJXJXV / ~ \J 


RSDHLTR 945 

XVlwfX— 'X XXJ X XV -s jl —J 


~J -J 


GTTGGAGCC 


536 
~j *—) \j 


DCRDIiAR 741 


OSDHLTR 946 


7 54 
»j _? ^± 


GAAGAGGAC 


537 
> — > t 


DSSNLTR 742 


RSDNLTR 947 


O ET [T 
J J J 


<^aagaggac 


53 ft 

JJU 


EKANLTR 743 

XJ IVfliN XJ X XV / i 


RSDNLTR 948 

XV k_? X~/*XM XJ X XV 


j DO 


VJvJV_ 1 V3VJVJV_,VJ 


57 9 

~J O -7 


RCIDFLRR 744 

IvOl^l-llJlViV / 


PSDHLTK 94 9 

XVOXJXlXJ X XV ^iJ 


j j / 


VJVJV, X VJVJVJV-.VJ 




PQDFT.PP 74 R 

XVOl^XLXJXVXV / T: J 


PSDHLTK 9^0 

XVOXyXXXJXXV -X J V/ 


7 R ft 

J JO 


GGCTGGGCG 


541 


RCIDPT.RR 746 

XVOXJi-f XJXVXV /TO 


RSDHLTK 951 

XVkJXyXIXJ X IV y J X 


7 61 

O D X 


GGGTTTGGG 


542 


RSDHLTR 747 

XVkJX-/XXX_ I X XV / ~ / 


OSSALTR 952 


767 
._> O O 


GGGTTTGGG 


543 


RSDHLTR 74 8 

XVkJ X_sXXJ_J X XV / ~ VJ 


OSSVLTR 953 

v^ kj v xj xxv ~j -J 


o £" yi 
O O *± 


GTGTCCGAAG 544 


PQDKTT.TP 74 Q 

JxOxJlAlXJ X XV / *± -7 


DCJAVLTT 9 c ;4 

L/Ori V 111 1 yZj*± 


o c a 

ODD 


GGTGCTGGT 


545 


PiAQT-TTTP 7^H 
<^i-ioiilj 1 xv /DU 


^jH.O V XJ X XV -7 J J 


*a £ £ 
Job 


GAGGGTGCT 


546 


V J_i x xv / D JL 


^i-io xll_i i. XV JJO 


£ *7 


GGGGGCGGG 


547 


PQTlT-TT.TP 7^9 


"nQriTTT-TP Q^7 

XJ O VJXlXJ IK J J / 


7 6 ft 
j o o 


GAGGGGGCG 


548 


P^HTFLTP 7R7 

IvOXJlliXJ X lv f ZJ S> 


XVOX-'XIXJ X XV z> o 


7 6 9 


GTAGTTGTG 


549 


RSDALTR 7 54 

i\ijJJriiJ X lv / jt 


TGGSTiAR 95 9 

X VJ VJ k_J X_liilV 


-J / u 


GTAGTTGTG 


550 


RSDATiTR 7SS 

XV O X-/XT.J 1 X LV / J J 


NRATLAR 96 0 

XM Ivn X X_lxxXV ^ U u 


O / x. 


GTAGTTGTG 


551 


PSDAT.TR 7 56 

ivOlJ^lXJ X Xv / VJ 


NRATTiAR 961 

l\lXViTl X XXTlJTV V) 1 


7 79 


GTAGTTGTG 


552 


PCIDGTJ.P 7R7 

IvOl^OlJXJXv i ZD j 


THHCIT.AP 969 

X OOOxXrlXV J/ u /■ 


"3 "7 *2 


GTAGTTGTG 


553 


PQPlQTTP 7^Tft 
iXoiJoxjxjK. / JO 


ATP AT 1 !. A P Q67 
IN K-ri. 1 xxrtxv !?Dj 


OTA 


GCTGAGGAA 


554 


HP QTJT A7P 7c:Q 

^XvOlN l_l V XV / J J 


xvOJJxNIxj x xv JOi 


OTC 

J / O 


GAGGAAGAT 


555 


nnQ'NTT.AP 76D 


V,^ O wlM xjv^xv 3D J 


377 


GTGTTGGCAG 556 


QSGSLTR 761 


RGDALTS 966 


378 


GCCGAGGAGA 557 


RSDNLTR 762 


RSDNLTR 967 


379 


GCCGAGGAGA 558 


RSDNLTR 763 


RSDNLTR 968 


380 


GAGTCGGAAG 559 


QSANLAR 764 


RSDELTT 969 
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SI 


1-US2 


QSDVLTR 1144 


1000 


NSDDLTR 1145 


1000 


NSDDLTR 1146 


1000 


TSGHLTR 1147 


1000 


DCRDLTT 1148 


332 


TLHTLTR 114 9 


1000 


QSDHLTR 1150 


26 


TSGALTR 1151 


1000 


QRSNLVR 1152 


28 


QRSNLVR 1153 


20 


DSDHLSR 1154 


1000 


DSDHLSR 1155 


1000 


DSSHLSR 1156 


225 


RSDHLTR 1157 


130 


RSDHLTR 115 8 


200 


RSDSLTR 1159 


1000 


QASHLTR 1160 


600 


RSDNLTR 1161 


1000 


RSDHLQR 1162 


60 


RSDNLTR 1163 


3.5 


QSGSLTR 1164 


95 


QSASLTR 1165 


300 


QSGSLTR 1166 


175 


QSASLTR 1167 


112 .5 


QSASLTR 1168 


320 


TSSELQR 1169 


3 . 3 


RSDNLTR 1170 


85 


RSDALTR 1171 


89 


DRSSLTR 1172 


31 


ERGTLAR 1173 


3 


RSDNLAR 1174 


1000 
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381 GCAGCTGCGC 560 RSDELTR 765 
383 TGGTTGGTAT 561 Q SATLAR 766 
3 84 GTGGGCTTCA 562 DRSALTT 767 
3 85 GGGGCGGAGC 563 RSDNLTR 768 

386 GGGGCGGAGC 564 RSDNLTR 769 

387 GGCGAGGCAA 565 QSGSLTR 770 
3 88 GGCGAGGCAA 566 QSGDLTR 771 
3 90 GTGGCAGCGG 567 RSDTLKK 772 
392 GTGGCAGCGG 568 RSDELTR 773 
396 GCGGGAGCAG 569 QSGSLTR 774 

3 97 GCGGGAGCAG 570 QSGDLTR 775 
400 TCAGTGGTGG 571 RSDALAR 776 

405 GCGGCCGCA 572 RSDELTR 777 

406 GCGGCCGCA 573 RSDELTR 778 

407 GCGGCCGCA 574 QSWELTR 779 

408 GCGGCCGCA 575 QSWELTR 7 80 

409 GCGGCCGCA 576 QSGSLTR 781 

410 GCAGAAGTC 577 RSDALTR 782 

411 GCGGCCGCA 578 QSGSLTR 783 

412 GCGTGGGCG 579 QSGSLTR 784 

413 GCGTGGGCA 580 QSGSLTR 785 

414 GCAGAAGCA 5 81 RSDELTR 78 6 

415 GTGTGCGGA 582 DRSHLTR 787 

416 TGTGCGGCC 583 ERGTLAR 788 

4 93 GGGGTGGCGG 584 RSDTLKK 789 
494 GCCGAGGAGA 585 RSDNLTR 790 

496 GGTGGTGGC 586 DTSHLRR 791 

497 GTTTGCGTC 587 ETASLRR 792 
4 98 GAAGAGGCA 588 QTGELRR 793 
499 GCTTGTGAG 589 RTSNLRR 794 
50 0 GCTTGTGAG 590 RSDNLTR 795 
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OSSDLOR 970 


OSGDLTR 1175 


1 . 5 

*X» * mm* 


RGDALTS 971 


RSDHLTT 1176 


1000 

«x> \s 


DRSHLAR 972 

X-/X V l_? X 1 1 1 1 \ _y / 


RSDALAR 1177 

i \ i ii j f—\. i i r— v i \ XX/ / 


60 


RSDTLKK 97 3 


RSDHLSR 1178 

X VlJX/l XXJv_s XV X, X. / VJ 


1 2 

X • 


RSDELOR 974 


RSDHLSR 1179 

X V Am r m\m i Ii i ■! X 1 i 1 1 9 mW* 


0 . 4 

V • mm- 


RSDNLAR 975 


DRSHLAR 1180 


2 . 5 


RSDNLAR 976 


DRSHLAR 1181 

1 * 1 1 1 If XX ^ *Xp *Xp v — 


28 


OSSDLOK 977 


RSDALAR 1182 


20 

mm J 


OSSDLOK 978 


RSDALAR 1183 


1000 

^Li V/ v v 


O^CTHT.OR 97 9 


RSDTLKK 1184 

XV uD LJ X XJ X VX V J — L U " 


18 8 

X vJ ■ u 


OSGHLOR 98 0 

y/ VJX X J — IV,/ XV ^ \J \J 


RSDTLKK 1185 


25 


RSDSLAR 981 


OSGDLRT 1186 

V^ kJ VJi— / J 1 1.V X X X VJ \J 


40 


ERGTLAR 982 

X_l X VVJ X ' lrv ~S \J £-i 


RSDERKR 118 7 


110 

-1 1- V 


DRSSLTR 983 


RSDERKR 1188 


110 

-1— -L. V/ 


ERGTLAR 984 

X— IX VV_J X i u i i \ \J _L 


RSDERKR 1189 


410 

X mmm, \mW 


XJxvO O XJ J. XV O «J 


XV O X^JJjXvXVXv J — L V/ 


J u u 


T?Pf-VTT.Z\T? 

JZiXvO _L 1 'H r\ 27 O D 


XVOX^X-lXVXVXV 11^1 




n^tTNT.TP 9 Pi 7 


OSGSTtTR 1199 




RSDHLTT 98 8 


RSDERKR 1193 


1000 


RSDHLTT 98 9 


RSDERKR 1194 

XV 1— ' i— / J— 1 VXV X X _^ ^£ 


5 


RSDHLTT 990 


RSDERKR 1195 


5 


OSANLOR 991 


OSGSLTR 1196 

V^ IwJ VJ XI X XV X X _^ W 


1000 

X \J \J \J 


X_jXVXX O i-l^yj X Z? Z> 


iX-iJ X_/rt.J_l X XV J L J7 / 


1 000 

_L \J \J \J 


PSDF1T.RR 99^ 

XV O i-/ X_J XJ X VX v „y _J 


DRSHLOT 1198 

X-/XV O X XXJ V^ X X X u 


1000 

X U U v 


P^D^LAP 994 


RSDTTTtSR 1199 

XV k_J X-/ X X J_J k_> XV XXy ^ 


-J V V 


RSDNLTR 995 


DRSSLTR 12 0 0 


90 


TSGHLOR 996 


TSGHLSR 12 01 

X wj \J±±J—IU XV X £-l \J X 


1000 

X V/ V V/ 


DSAHLQR 997 


TSSALSR 12 02 


1000 


RSDNLQR 998 


QSGNLSR 1203 


30 


TSSHLQK 999 


DTDHLRR 1204 


1000 


QSSNLQT 1000 


DRSHLAR 12 05 


1000 



45 
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vJT JL VJTVJVJVJV-J XX J./1 




-TvO J_^ilXJ O Z\. Jl. U U X 


PQDAT.AP 1 9 OC 


Q 
O 




DUZ 


pppptppp a ^ q 9 


PQAPT.AP 7Q7 
v^OrtXlXxrtJ-v 1 y 1 


i\. O Ui-iXj/-iI\. xuuz 


P QTlT-JT CP 1 9P17 
KoXJxIXjoK XZU / 


c n 




n i 

jU / 


pa ppt apapp ^ q 9 


PCrYMT.AP 7QP 


ppqat.ap 1 nm 

v^lvoiAXji-LiA. 1UU j 


PCPkTvTT AP 1 9 Pi P 
K o UlN Xxf-iK XZUo 


i n 
X u 




DUO 


pa ppt apapp c; QZL 


PQDKTT.AP 7QQ 
XV O XJIN XJrt.iV i y y 


PiQatt.ap i nnzi 


PQPlTVTT.AP 19 0Q 
XvD xJl\ ij/iXv XZUj? 


1 o 
x u 




D U _? 


^ X L,vj X V\j X VjAjv- j^j 


P Cn UT.TT Qfifi 
I\.OL>T1Xj XX ouu 


PCRAT.AP 1 rtPiR 


PlPQAT AP 1 91 H 
iJ iX oiAXji-iiV X Z X U 


i pi n 

XUU 




o 

D X U 


PTTPAPPAAP RQC 


PQPKTT.AP P01 


PCITTMT.AP 1 nOC 


7VTPATT.AP 1911 
IM xvi-i X Xii-ixv X Z X X 


1 oo 

XUU 




t^l 1 
D X X 


PTTP APPA AP RQ7 


PQPTsIT AP QflO 
V O kjilM Xji-iJA. OUZ 


P QTiTCTT A P 1 H H 7 
iv o XJi\ Xji-iiX. XUU / 


PCCAT AP 1 91 9 
^OO^XxfaXv XZXZ 


i on 

XUU 




jIZ 


o\H.vjvj 1 bbfiiio D y O 




PQT1AT AP i n n P 
K.OXXttXorlI\. lUUO 


PQT^TvTT AP 1919 


i r\ 
x u 




D X O 


papptppaap ^qq 

Vj.rt.VjvJ X VJVJrtjrlVj Z) y y 


DQAKTLAP ftDZl 


PQDAT.AP 1 DDQ 

XV O I }f-\ 1 iM K X U U -7 


PQD'NTT.AP 1 9 1 ZL 
■LVO Uvi Xjj-ilv 1Z It: 


i r 

X . D 




R1 4 


j. /-loo X vjvj X VjVj O U U 


PQTlAT.TP film 

Xv u_> LJr\LJ X Iv OUZ) 


P QHAT.AP 1 fi 1 fi 

rg 1 JM i ih re X U X U 


PQD'KTT.TT 191^ 
XVOIJIN Xj XX IZIj 


1 0 
X u 




Ji J 


tpppappapt £ n i 

X VJ VJ VJi^ VJ VXri VJ X O U X 


P^TWT.TP ROC 

I\.OJJl\ XJ X i\. OUD 


PCJDMT.TP 1011 
rv O JJ1N Xj X Iv XUXX 


PQTlTTT.TT 1 91 C 
iVOX^XiXJ XX X Z X o 


0 R 
u . z> 


sstss. 

JP 


R1 C 


PPAPPAPPT C09 

vJvJrtVjVjrt.VjV_, X U UZ 


X X O J— iXJlviv O \J / 


ncipwi-np ini9 


pqriT-TT.C!P 1917 
^ O VjxxXj O xv XZX / 


7 O O 
/ U U 


hVw* 


j! / 


nnAriPTPPriri c n 9 

\jjVj.rt.VjV_, X VjVjVjvJ Ol/J 


PTTl'P'T.PP ftfift 




PQPT4T.CJP 1 91 P 
^OvjIIXjOIv XZXO 


o 

D U 


ft il ft 


^ 1 P 
3 X O 


ppppr'APPAP c oa 

VjVjVjVjVjrt.vJvJrt.vJ Qui 


PTPPTT.PP ftfiQ 


PQPTTT.PP 1 01 ZL 
v^o vjIxXjv^xv lull 


pq*nT4T.c:P 1 91 Q 
xvoxJIxIjOxv XZX_7 


9 0 
J U 




D X _7 


ppppappapa ^rm 

VJ VJ Vj OX-lO Vjrt.VJ.T-i. OU J 


PQTTKIT.AP fil fl 


PQ*n7vTT.QP 1 01 R 
IVOIJINJXjOXv 1U 1 J 


PQPiT-TT.QP 1990 
KoXJjtIXjOJX xzzu 


O 9 
U . O 


~ "» I? 


C9H 
DZU 


PPAPPZ\(1Z\T coc 

VJ ort-Vj VJ-rt. Vjrt. X DUO 


TTAMT PP PI 1 
X X jriiM XjXvxv Oil 


PQPTJT PP 1 OI C 
V^Oo^rlXjVjJlv XUXD 


PCPT-TT CP 199 1 
^ovxtLXjoK. XZZX 








ppappappa co7 

vJv^rt.vJv-rtVjVJrt. Du / 


Pi T 1 PUT PP Q1 O 
V X vjxIXjKK. OlZ 


nCPPT PP 1 A1 7 
(y/OVJliXjyK XUX / 


PCPPT CP 1999 

yb^JiijoK Xzzz 


1 AAA 

X u u u 


£ ii 




PTiTPSPPPa cnp 

vjrt. X Vjrt.vJVjV_.rt. DUO 


PTPT7T PP Q 1 "3 


PCPl"NTT PP 1 A1 Q 
KbiJlNXjyK XUXo 


rpnTl VTT CO 1 O O O 

IbAlMLioK Izz j 


•O P\ Pi 

z U U 


it 

- \i 

£ 


OZ / 


cicxcici7\ pp a tp cdq 

vJVJvJVJrtLjvjrt. X V_> O U 


TTQTJT.PP PI A 
X 1 OlN Xjlviv OXtc 


P Q Q"NTT .PP 1 O 1 Q 
KOOINIXj^K X U X y 


PCTYPT CP 1 99 A 
KoXJxIXjoK XZZ^fc 


o 
z 




cop 


VjVjVjVjrt.vJvJ.rt. X V_ Dlu 


TTQKTT.PP PI ^ 
X X OiM XjIvIn. OIj 


PCCMT.PP 1 09 0 
ivo OxN! XjV^iv 1UZU 


p CFtT-JT CP 1 9 9 C 
JXoXJjtIXjoK IZZ j 


i n 
x u 






vJrt.vjvjV_, X X VjvjVj Oil 


PTPiT-IT PTT PI C 
x\. X LJnXjrCrv old 


TO ATTT PP 1 HOI 


P C CTVTT CP i onr 
Ko£d1\IXjoK Xzzd 


1 AAA 

x u u u 






nHHHTliHHHTT CI 9 
VJv^vJvJrt.VjvJV_. X X OlZ 


TTPT?T PP £51 7 
X X ^jUjXjKK O X / 


P C CT\TT PP 1 Pi 9 9 


KbiJJiijDK lzz / 


Idu 




R9 9 


PPPPAPPPTT CI 7 
vJv^vJvjrt.vjvjL- XX Olj 


nccnT pp oi q 
yooXJXjy.K oXo 


P C CKTT PP i mo 


D C "Pi ITT CP 1 99 Q 

KoUJixjibK Xzzo 


inn 
X U U 




coo 
D J j 


PPPPAPPPTT CI A 
vjv-vjvjrt.VjVjv^ XX Oil 


nQcm pp pi q 


PCFMsTT AP 1 Pi 9 A 
KoUlNIXjxiK XUZ^fc 


PCATlT CP 1 99Q 

koaxjxjok xzzy 


•7 

/ 




D J *± 


PPPPA PPPTT CI R 
vjV^VjVjrtAjVjV., XX Olj 


pQQnT.pp pon 

^OOJ-JXiVJA. ozu 


P QnTsTT A P 1 O 9 c: 
xvoLJIMXxBXv XUZD 


P CPvPiT PP 1 9 9 n 
KoIJJJXjKK XzoU 


i n 

xu 




c;9 
j j j 


PPAPPPPPP CI C 
vJv^rt.vjV_,V_,vjVjVj DID 


PTPiWT PP P01 
rC X lJilljlvJA. oZi 


T-TQGPiT PP 1 H9C 


PCP17T CP 1 991 
yOvJxLXjbK XZJX 


i n n n 
1U u U 




e;^ p. 


ppapappptt ci 7 

vjj \^rt vJrtvJ VJ XX OX/ 


yDDJJXJv^n OZZ 


PQTTMT.AP 1 09 7 
IvOXJInIXxHlK. XUZ / 


PCPCT TP 199 9 
^bVjbXjXK iZJZ 


n n 
/ U 




540 


TGGGCAGGCC 618 


DRSHLTR 823 


QSGSLTR 102 8 


RSDHLTT 1233 


55 






GGGGAGGAT 619 


TTSNLRR 824 


RSSNLQR 102 9 


RSDHLSR 12 34 


3 




570 


GGGGAAGGCT 62 0 


DSGHLTR 825 


QRSNLVR 103 0 


RSDHLTR 1235 


20 




571 


GTGTGTGTGT 621 


RSDSLTR 82 6 


QRSNLVR 1031 


RSDSLLR 123 6 


1000 
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572 GCATACGTGG 622 RSDSLLR 827 

573 GCATACGTG 623 RSDSLLR 828 

574 TACGTGGGGT 624 RSDHLTR 82 9 

575 TACGTGGGCT 625 DFSHLTR 83 0 

576 GAGGGTGTTG 626 NSDTLAR 831 

577 GGAGCGGGGA 627 RSDHLSR 832 
579 GGGGTTGAGG 628 RSDNLTR 833 
58 0 GGTGTTGGAG 62 9 QRAHLAR 834 
581 TACGTGGGTT 63 0 QSSHLTR 83 5 

583 GTAGGGGTTG 6 3 1 NS SALTR 836 

584 GAAGGCGGAG 632 QAGHLTR 837 

585 GAAGGCGGAG 633 QAGHLTR 838 
587 GGGGGTTACG 634 DKGNLQT 83 9 
58 8 GGGGGGGGGG 63 5 RSDHLSR 840 
589 GGAGTATGCT 636 DSGHLAS 841 
595 TGGTTGGTAT 637 QRGSLAR 842 

597 TGGTTGGTA 63 8 QNSAMRK 843 

598 TGGTTGGTA 639 QRGSLAR 844 

599 TGGTTGGTA 64 0 QNSAMRK 845 

600 GAGTCGGAA 641 QSANLAR 846 

601 GAGTCGGAA 642 RSANLTR 847 

602 GAGTCGGAA 643 RSANLTR 84 8 

603 GAGTCGGAA 644 QSGNLAR 849 
606 GGGGAGGATC 645 TTSNLRR 850 
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DKGNLQS 1032 


QSDDLTR 1237 


1000 


DKGNLQS 1033 


QSGDLTR 123 8 


1000 


RSDHLTR 1034 


DKGNLQT 1239 


25 


RSDHLTR 1035 


DKGNLQT 124 0 

A*** *J* »* ^ ^ ■ fc*» *W*« 


472 


TSGHLTR 103 6 


RSDNLTR 1241 


200 


RSDELOR 1037 


QSDHLTR 1242 


200 


NRDTLAR 103 8 


TSGHLTR 1243 


200 


NRDTLAR 103 9 


TSGHLTR 1244 


1000 


RSDSLLR 104 0 


DKGNLQT 1245 


382 


RSDHLTR 1041 


QSASLTR 124 6 


46 


DKSHLTR 1042 


QSGNLTR 1247 


1000 


DSGHLTR 1043 


QSGNLTR 1248 


1000 


TSGHLTR 1044 


RSDHLSK 124 9 


500 

*w* \y v/ 


RSDHLTR 1045 


RSDHLSK 12 50 


30 


OSATLAR 1046 


OSDHLTR 1251 


1000 


RGDALTR 1047 


RSDHLTT 1252 


73 . 3 


RGDALTS 1048 


RSDHLTT 12 53 


1000 


RDGSLTS 104 9 


RSDHLTT 1254 

X V w X>r X 1XJ ^ «L X 


1000 


RDOSLTS 10 50 


RSDHLTT 1255 


1000 

•1* w V/ v 


RSDELRT 1051 

IV k_> J_/ X_J J—l IV -L J- V — ' _L 


RSDNLAR 1256 


206.7 


RLDGLRT 1052 


RSDNLAR 12 57 


606.7 


RQDTLVG 1053 


RSDNLAR 1258 


616.7 


RSDELRT 1054 


RSDNLAR 1259 


166 .7 


RSDNLQR 1055 


RSDHLSR 12 60 


0.2 
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TABLE 3 
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t ? ; 



^7" 



SEQ 

SBS # TARGET IP 



Fl 



SEQ 
ID 



F2 



SEP 
ID 



F3 



SEP 
ID 



Kd 
(nM) 



897 GAGGAGGTGA 1261 RSDALAR 1347 RSDNLAR 1433 RSDNLVR 1519 0.07 

828 GCGGAGGACC 1262 EKANLTR 1348 RSDNLAR 1434 RSDERKR 1520 0.1 
884 GAGGAGGTGA 12 63 RSDSLTR 134 9 RSDNLAR 1435 RSDNLVR 1521 0.15 
817 GAGGAGGTGA 1264 RSDSLTR 1350 RSDNLAR 1436 RSDNLAR 1522 0.31 
666 GCGGAGGCGC 1265 RSDDLTR 1351 RSDNLTR 1437 RSDTLKK 1523 0.5 

829 GCGGAGGACC 1266 EKANLTR 1352 RSDNLAR 1438 RSDTLKK 1524 0.52 
670 GACGTGGAGG 1267 RSDNLAR 1353 RSDALAR 1439 DRSNLTR 1525 0.57 
801 AAGGAGTCGC 1268 RSADLRT 1354 RSDNLAR 1440 RSDNLTQ 1526 0.85 
668 GTGGAGGCCA 1269 ERGTLAR 1355 RSDNLAR 1441 RSDALAR 1527 1.13 
895 ATGGATTCAG 127 0 QSHDLTK 1356 TSGNLVR 1442 RSDALTQ 152 8 1.4 
799 GGGGGAGCTG 1271 QSSDLQR 1357 QRAHLER 1443 RSDHLSR 1529 1.85 
798 GGGGGAGCTG 1272 QSSDLQR 1358 QSGHLQR 1444 RSDHLSR 153 0 3 
842 GAGGTGGGCT 1273 DRSHLTR 1359 RSDALAR 1445 RSDNLAR 1531 5.4 
894 TCAGTGGTAT 1274 QRSALAR 1360 RSDALSR 1446 QSHDLTK 1532 6.15 
892 ATGGATTCAG 1275 QSHDLTK 1361 QQSNLVR 1447 RSDALTQ 1533 6.2 
888 TCAGTGGTAT 1276 QSSSLVR 1362 RSDALSR 1448 QSHDLTK 1534 14 
73 9 GCGGGCGGGC 1277 RSDHLTR 1363 ERGHLTR 1449 RSDDLRR 1535 16.5 
850 CAGGCTGTGG 1278 RSDALTR 13 64 QSSDLTR 1450 RSDNLRE 1536 17 
797 GCAGAGGCTG 1279 QSSDLQR 1365 RSDNLAR 1451 QSGDLTR 1537 17.5 
891 TCAGTGGTAT 1280 QSSSLVR 1366 RSDALSR 1452 QSGSLRT 1538 18.5 
887 TCAGTGGTAT 1281 QRSALAR 1367 RSDALSR 1453 QSGDLRT 1539 23.75 
672 TCGGACGTGG 1282 RSDALAR 1368 DRSNLTR 1454 RSDELRT 1540 24 
836 GGGGAGGCCC 1283 ERGTLAR 1369 RSDNLAR 1455 RSDHLSR 1541 24.25 
674 GCGGCGTCGG 12 84 RSDELRT 13 70 RADTLRR 1456 RSDTLKK 1542 27.5 
849 GGGGCCCTGG 1285 RSDALRE 1371 DRSSLTR 1457 RSDHLTQ 1543 29.05 
825 GAATGGGCAG 1286 QSGSLTR 1372 RSDHLTT 1458 QSGNLTR 1544 37.3 
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848 GGGGAGGCCC 1288 DRSSLTR 1374 RSDNLAR 1460 RSDHLSR 1546 

662 AGAGCGGCAC 1289 QTGSLTR 1375 RSDELQR 1461 QSGHLNQ 1547 
667 GAGTCGGACG 12 90 DRSNLTR 1376 RSDELRT 1462 RSDNLAR 1548 

803 GCAGCGGCTC 1291 QSSDLQR 1377 RSDELQR 1463 QSGSLTR 1549 
671 TCGGACGAGT 1292 RSDNLAR 1378 DRSNLTR 1464 RSDELRT 1550 
851 GAGATGGATC 1293 QSSNLQR 1379 RRDVLMN 1465 RLHNLQR 1551 

804 GCAGCGGCTC 1294 QSSDLQR 1380 RSDDLNR 1466 QSGSLTR 1552 
669 GACGAGTCGG 1295 RSDELRT 1381 RSDNLAR 1467 DRSNLTR 1553 
682 GCTGCAGGAG 1296 RSDHLAR 1382 QSGDLTR 1468 QSSDLSR 1554 
845 GAGATGGATC 1297 QSSNLQR 1383 RSDALRQ 1469 RLHNLQR 1555 

663 AGAGCGGCAC 1298 QTGSLTR 1384 RSDELQR 147 0 KNWKLQA 1556 
738 GCGGGGTCCG 1299 ERGTLTT 1385 RSDHLSR 1471 RSDDLRR 1557 

664 AGAGCGGCAC 1300 QTGSLTR 1386 RADTLRR 1472 ASSRLAT 1558 

833 GACTAGGACC 1301 EKANLTR 1387 RSDNLTK 1473 DRSNLTR 1559 
685 GCTGCAGGAG 13 02 RSDHLAR 1388 QSGSLTR 1474 QSSDLSR 1560 
835 TAGGGAGCGT 1303 RADTLRR 1389 QSGHLTR 1475 RSDNLTT 1561 
847 TAGGGAGCGT 1304 RSDDLTR 1390 QSGHLTR 1476 RSDNLTT 1562 
818 GAATGGGCAG 1305 QSGSLTR 1391 RSDHLTT 1477 QSSNLVR 1563 

834 GACTAGGACC 1306 EKANLTR 1392 RSDHLTT 1478 DRSNLTR 1564 
837 GGGGCCCTGG 1307 RSDALRE 1393 DRSSLTR 1479 RSDHLSR 1565 
764 GCAGAGGCTG 13 08 TSGELVR 13 94 RSDNLAR 1480 QSGDLTR 1566 
774 GCAGCGGTAG 1309 QRSALAR 1395 RSDELQR 1481 Q SGDLTR 1567 



775 GCAGCGGTAG 1312 QSGALTR 1398 RSDELQR 1484 QSGDLTR 1570 

763 GCAGAGGCTG 1313 TSGELVR 1399 RSDNLAR 1485 QSGSLTR 1571 

838 GGGGCCCTGG 1314 RSDALRE 14 00 DRSSLTR 1486 RSDHLTA 1572 

841 GAGTGTGAGG 1315 RSDNLAR 1401 QSSHLAS 14 87 RSDNLAR 1573 

770 TTGGCAGCCT 1316 DRSSLTR 1402 QSGSLTR 1488 RSDSLTK 1574 

767 GGGGGAGCTG 1317 QSSDLAR 1403 QSGHLQR 1489 RSDHLSR 1575 



RSDTLKK 1545 


48.33 


RSDHLSR 1546 


49.5 


QSGHLNQ 1547 


50 


RSDNLAR 154 8 


50 


QSGSLTR 154 9 


57 . 5 


RSDELRT 1550 


64 


RLHNLQR 1551 


74 


QSGSLTR 1552 


82 .5 


DRSNLTR 1553 


90 


OSSDLSR 1554 


90 


RLHNLQR 1555 


112 . 5 


KNWKLQA 1556 


115 


RSDDLRR 1557 


120 


ASSRLAT 1558 


125 


DRSNLTR 1559 


136 


QSSDLSR 1560 


150 


RSDNLTT 1561 


150 


RSDNLTT 1562 


150 


QSSNLVR 1563 


167 


DRSNLTR 1564 


186 


RSDHLSR 1565 


222 


OSGDLTR 1566 


255 


QSGDLTR 1567 


258 


ERGTLAR 1568 


262.5 


DRSDLTR 1569 


262 . 5 


OSGDLTR 1570 


265 


OSGSLTR 1571 


275 


RSDHLTA 1572 


300 


RSDNLAR 1573 


300 


RSDSLTK 1574 


325 


RSDHLSR 1575 


335 
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800 TTGGCAGCCT 1318 
832 GACTAGGACC 1319 
844 GAGATGGATC 1320 

683 GCTGCAGGAG 1321 
805 GCAGCGGTAG 1322 

839 GAGTGTGAGG 1323 

840 GAGTGTGAGG 1324 

830 GGAGAGTCGG 1325 

831 GGAGAGTCGG 1326 

684 GCTGCAGGAG 1327 
846 GAGATGGATC 1328 

5 819 AAGTAGGGTG 132 9 

£ 820 ACGGTAGTTA 1330 

% 821 ACGGTAGTTA 1331 

*j 822 GTGTGCTGGT 1332 

« it ' s 

Jr 823 GTGTGCTGGT 1333 

f s 824 GTGTGCTGGT 13 34 

M; 885 GTGTGCTGGT 1335 

rf ** v 

p 886 TCAGTGGTAT 133 6 

P 889 ATGGATTCAG 1337 

8 90 CTGGTATGTC 1338 
8 96 AAGTAGGGTG 133 9 

898 ACGGTAGTTA 1340 

899 CTGGTATGTC 1341 

900 CTGGTATGTC 1342 

901 CTGGTATGTC 1343 
773 GCAGCGGTAG 1344 
768 GGGGGAGCTG 1345 
681 GCTGCAGGAG 134 6 



ERGTLAR 1404 QSGSLTR 1490 
EKANLTR 1405 RSDNLTT 1491 
QSSNLQR 1406 RSDALRQ 1492 
QSGHLAR 1407 QSGSLTR 1493 
QRSALAR 1408 RSDELQR 1494 
RSDNLAR 1409 TSDHLAS 14 95 
RSDNLAR 1410 MSHHLKT 1496 
RSDELRT 1411 RSDNLAR 1497 
RSDDLTK 1412 RSDNLAR 1498 
RSAHLAR 1413 QSGSLTR 1499 
QSSNLQR 1414 RRDVLMN 1500 
QSSHLTR 1415 RSDNLTT 1501 
QSSALTR 1416 QRSALAR 1502 
NRATLAR 1417 QRSALAR 1503 
RSDHLTT 1418 ERQHLAT 1504 
RSDHLTK 1419 ERQHLAT 1505 
RSDHLTT 1420 DRSHLRT 1506 
RSDHLTK 1421 DRSHLRT 1507 
QSSSLVR 1422 RSDALSR 1508 
QSGSLTT 1423 QQSNLVR 1509 
QRSHLTT 1424 QRSALAR 1510 
TSGHLVR 1425 RSDNLTT 1511 
NRATLAR 1426 QSSSLVR 1512 
QRSHLTT 1427 QSSSLVR 1513 
MSHHLKE 1428 QSSSLVR 1514 
MSHHLKE 1429 QRSALAR 1515 

t 

QSGALTR 1430 RSDELQR 1516 
QSSDLAR 1431 QRAHLER 1517 
RSAHLAR 1432 QSGDLTR 1518 
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RSDSLTK 1576 


400 


DRSNLTR 1577 


408 


RSDNLOR 1578 


444 


OSSDLSR 1579 


500 

Vp* 


QSGSLTR 1580 


500 


RSDNLAR 1581 


625 

N—' faj 


RSDNLAR 1582 


625 

\J <w r 


ORAHLAR 1583 


683 


ORAHLAR 1584 


700 


OSSDLSR 1585 


850 

^m/ V 


RSDNLOR 1586 


889 5 


R^DNLTO 1587 


1000 

_L V/ \J \J 


RmiTTiTO 1 Rfift 

iVulJilJiy _L _J O O 


1000 

\J \J \J 


RSDTT.TO 158 9 

J.VkJi-' J- J 1 J- _L *J KJ *S 


1000 

J. v w v 


RSDALAR 15 90 


1000 

JL W W \J 


RSDALAR 1 591 


1000 

J- \J \J \J 


RSDATiAR 15 92 


1000 

J_ \J \J \J 


PCfDAT.AP 1 R9^ 


1 000 

J- \J \J \J 


OSGDLRT IS 94 


1000 

X v v v 


RSDALTO 1595 


1000 


T?S"DATiR"R 1 5 96 


1000 

_1_ V/ w W 


PSDNT.TO 1 RQ7 


1 000 

J- \s \J \J 


R^DTT.TO 1 R9ft 

IVuJJ X xj x v^/ J_ J ^ O 


1 000 

X w \J \J 


RSDALRE 1599 


1000 

-1- \J \J \J 


"RSDALRE 1^0 0 

ivojjriiJiVij _l \j \j \j 


1000 

X V V V 


RSDALRE 1601 


1000 


QSGSLTR 1602 


1250 


RSDHLSR 1603 


2000 


QSSDLSR 1604 


3000 
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TABLE 4 



SEQ 

^4 TARGET ID 



Fl 



SEQ 
ID 



F2 



SEQ 
ID 



F3 



SEQ 
ID 



608 TTGGCTGGGC 1606 GSWHLTR 1708 QSSDLQR 1810 RSDSLTK 1912 



Kd 
(nM) 



607 AAGGTGGCAG 1605 QSGDLTR 1707 RSDSLAR 1809 RLDNRTA 1911 6.5 
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611 GTGGCTGCAG 1607 QSGDLTR 1709 QSSDLQR 1811 RSDALAR 1913 11.5 

612 GTGGCTGCAG 1608 QSGTLTR 1710 QSSDLQR 1812 RSDALAR 1914 0.38 

613 TTGGCTGGGC 1609 RSDHLAR 1711 QSSDLQR 1813 RGDALTS 1915 1.45 

614 TTGGCTGGGC 1610 RSDHLAR 1712 QSSDLQR 1814 RSDSLTK 1916 2 

616 GAGGAGGATG 1611 QSSNLQR 1713 * RSDNLAR 1815 RSDNLQR 1917 0.08 

617 AAGGGGGGG 1612 RSDHLSR 1714 RSDHLTR 1816 RKDNMTA 1918 1 

618 AAGGGGGGG 1613 RSDHLSR 1715 RSDHLTR 1817 RKDNMTQ 1919 0.55 

619 AAGGGGGGG 1614 RSDHLSR 1716 RSDHLTR 1818 RKDNMTN 1920 1.34 

620 AAGGGGGGG 1615 RSDHLSR 1717 RSDHLTR 1819 RLDNRTA 1921 0.54 

621 AAGGGGGGG 1616 RSDHLSR 1718 RSDHLTR 1820 RLDNRTQ 1922 0.75 
624 ACGGATGTCT 1617 DRSALAR 1719 TSANLAR 1821 RSDTLRS 1923 7 
628 TTGTAGGGGA 1618 RSDHLTR 1720 RSDNLTT 1822 RGDALTS 1924 130 
62 9 TTGTAGGGGA 1619 RSSHLTR 1721 RSDNLTT 1823 RGDALTS 192 5 150 
630 CGGGGAGAGT 162 0 RSDNLAR 1722 QSGHLQR 1824 RSDHLRE 1926 37.5 

646 TTGGTGGAAG 1621 QSGNLAR 1723 RSDALAR 1825 RGDALTS 1927 35 

647 TTGGTGGAAG 1622 QSANLAR 1724 RSDALAR 1826 RGDALTS 1928 40 

651 GTTGTGGAAT 1623 QSGNLSR 1725 RSDALAR 1827 NRATLAR 1929 67.5 

652 TAGGAGGCTG 1624 QSSDLQR 1726 RSDNLAR 1828 RSDNLTT 1930 1.5 

653 TAGGAGGCTG 1625 TTSDLTR 1727 RSDNLAR 1829 RSDNLTT 1931 5.5 

654 TAGGCATAAA 162 6 QSGNLRT 1728 QSGSLTR 1830 RSDNLTT 1932 105 

655 TAGGCATAAA 162 7 QSGNLRT 1729 QSSTLRR 1831 RSDNLTT 1933 1000 

656 TAGGCATAAA 1628 QSGNLRT 1730 QSGSLTR 1832 RSDNLTS 1934 540 

657 TAGGCATAAA 162 9 QSGNLRT 1731 QSSTLRR 1833 RSDNLTS 1935 300 
660 GAGGGAGTTC 163 0 NRATLAR 1732 QSGHLTR 1834 RSDNLAR 193 6 8.25 



51 



661 
\j \j ^ 


GAGGGAGTTC 1631 


TTSALTR 


1733 


665 
\j \j -j 


GCGGAGGCGC 1632 


RSDDVTR 


1734 


689 
u o -/ 


AAGGCGGAGA 1633 


RSDNLTR 


1735 


vj _y ^ii 


AAGGCGGAGA 1634 


RSDNLTR 


1736 


69*3 


AAGGCGGAGA 1635 


RSDNLTR 


1737 


694 


AAGGCGGAGA 1636 


RSDNLTR 


1738 


695 
\j ^ *j 


GGGGGCGAGC 163 7 


RSSNLTR 


1739 


697 


TGAGCGGCGG 163 8 


RSDELTR 


1740 


698 


TGAGCGGCGG 163 9 


RSDELTR 


1741 


6 9 9 


HCGGCGGCAG 164 0 


OSGSLTR 


1742 


700 


GCGGCGGCAG 1641 


OSGDLTR 


1743 


7 01 

/ U X. 


ftCAGCGGAGC 1642 

VJ\_,.^i.vj \ .\j V_lxi.\JJ >> • -L VJ ~ 


RSDNLAR 


1744 


7 0 9 




RSDNLAR 


1745 


704 


AAGGTGGCAG 1 644 


OSGDLTR 


1746 


70 5 


GGGGTGGGGC 1645 


RSDHLAR 


1747 


7 06 


GGGGTGGGGC 164 6 


RSDHLAR 


1748 


7 0 8 


(^AHTCGGAA 164 7 


OSANLAR 


1749 


7 fi Q 


nAfyrcnrtAA 1 64 8 


OSANLAR 


1750 


71 n 




OCJGNLAR 


1751 


71 1 


nAf-VTCn(4A A 1650 


OSGNLAR 


1752 


71 9 
/ x,^ 


nnTf^Afk^ACVT 16 51 

\JVJT X VXrt. VJ VJ-tt. VJ J. X \J ~J X. 


RSDNLAR 


1753 


71 *3 




RSDNLAR 


1754 


714 


TrinnTCGCGG i 6 5? 

X VjVJvJ X \_-VJ\^vJVJ 1 U J J 


RSDELRR 


1755 


71 5 


TOHGTOGOGG 1654 

X VJVJ\J X V^\JV>Vj\j x. \j -j ~ 


RADTLRR 


1756 


71 £ 

/ X, O 


TTHnflAnCAC 1655 


OSGSTiTR 

i_) VJ l_> XJ X XV 


1757 

X / ' / 


71 7 


TTCHnAnrAP 1656 


OSGSLTR 


1758 


71 8 


TTn^flAftPAO 16 57 


OSGSLTR 

UiJ XJ X XV 


1759 


719 


GGCATGGTGG 1658 


RSDALTR 


1760 


720 


GAAGAGGATG 1659 


TTSNLAR 


1761 


722 


ATGGGGGTGG 1660 


RSDALTR 


1762 


724 


GGCATGGTGG 1661 


RSDALTR 


1763 
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QSGHLTR 1835 RSDNLAR 1937 1.73 

RSDNLTR 1836 RSDDLRR 1938 12.5 

RSDELQR 1837 RLDNRTA 1939 82.5 

RSDELQR 1838 RSDNLTQ 1940 51 

RADTLRR 1839 RLDNRTA 1941 95 

RADTLRR 1840 RSDNLTQ 1942 28.5 

DRSHLAR 1841 RSDHLTR 1943 850 

RSDELSR 1842 QSGHLTK 1944 200 

RSDELSR 1843 QSHGLTS 1945 300 

RSDDLQR 1844 RSDERKR 1946 21.5 

RSDDLQR 1845 RSDERKR 1947 45 

RSDELQR 1846 QSGSLTR 1948 50.5 

RSDELQR 1847 QSGDLTR 1949 73.5 

RSDSLAR 1848 RSDNLTQ 1950 5 

RSDSLAR 1849 RSDHLSR 1951 0.01 

RSDSLLR 1850 RSDHLSR 1952 0.05 

RQDTLVG 1851 RSDNLAR 1953 300 

RKDVLVS 1852 RSDNLAR 1954 400 

RLDGLRT 1853 RSDNLAR 1955 400 

RQDTLVG 1854 RSDNLAR 1956 400 

RSDNLAR 1855 MSDHLSR 1957 9.5 

RSDNLAR 1856 MSHHLSR 1958 0.15 

DRS ALAR 1857 RSDHLTT 1959 200 

DRS ALAR 1858 RSDHLTT 1960 0.46 

QSGHLQR 1859 RGDALTS 1961 200 

QSGHLQR 1860 RSDALTK 1962 150 

QSGHLQR 1861 RSDALTR 1963 107.5 

RSDALTS 1862 DRSHLAR 1964 20 

RSDNLAR 1863 QSGNLTR 1965 1.6 

RSDHLTR 1864 RSDALRQ 1966 0.7 

RSDALRQ 1865 DRSHLAR 1967 2.5 
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725 GCTTGAGTTA 1662 QSSALAR 1764 

726 GAAGAGGATG 1663 QSSNLAR 1765 

727 GCGGTGGCTC 1664 QSSDLTR 1766 

728 GGTGAGGAGT 1665 RSDNLAR 1767 
72 9 GGAGGGGAGT 1666 RSDNLAR 1768 

730 TGGGTCGCGG 1667 RSDDLTR 1769 

731 GTGGGGGAGA 1668 RSDNLAR 1770 

732 GCGGGTGGGG 1669 RSDHLAR 1771 

733 GCGGGTGGGG 1670 RSDHLAR 1772 

734 GGGGCTGGGT 1671 RSDHLAR 1773 

735 GCGGTGGCTC 1672 QSSDLTR 1774 

736 GAGGTGGGGA 1673 RSDHLAR 1775 

737 GGAGGGGAGT 1674 RSDNLAR 1776 

740 AAGGTGGCAG 1675 QSGSLTR 1777 

741 AAGGCTGAGA 1676 RSDNLTR 1778 

742 ACGGGGTTAT 1677 QRGALAS 1779 

743 ACGGGGTTAT 1678 QRGALAS 1780 

744 ACGGGGTTAT 1679 QRSALAS 1781 

745 ACGGGGTTAT 1680 QRSALAS 1782 

746 CTGGAAGCAT 1681 QSGSLTR 1783 
74 7 CTATTTTGGG 1682 RSDHLTT 1784 
74 8 TTGGACGGCG 1683 DSGHLTR 1785 
74 9 TTGGACGGCG 1684 DRSHLTR 1786 

750 GAGGGAGCGA 1685 RSDELTR 1787 

751 GGTGAGGAGT 1686 RSDNLAR 1788 

752 GAGGTGGGGA 1687 RSHHLAR 1789 

757 CGGGCGGCTG 1688 QSSDLRR 1790 

758 CGGGCGGCTG 168 9 QSSDLRR 1791 

759 TTGGACGGCG 1690 DSGHLTR 1792 

760 TTGGACGGCG 1691 DRSHLTR 1793 

761 GCGGTGGCTC 1692 QSSDLQR 1794 
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QSGHLQK 1866 QSSDLQR 1968 3000 

RSDNLAR 1867 QSGNLTR 1969 1.5 

RSDALSR 1868 RSDTLKK 1970 0.1 
RSDNLAR 1869 DSSKLSR 1971 15 

RSDHLSR 187 0 QSGHLAR 1972 1000 

DRS ALAR 1871 RSDHLTT 1973 1000 
RSDHLSR 1872 RSDALAR 1974 12 

QSSHLAR 1873 RSDDLTR 1975 22.5 

QSSHLAR 1874 RSDTLKK 1976 0.32 

QSSDLSR 1875 RSDHLSR 1977 0.25 

RSDALSR 1876 RSDERKR 1978 0.05 

RSDALSR 1877 RSDNLSR 1979 0.47 

RSDHLSR 1878 QRGHLSR 1980 1000 

RSDALAR 1879 RSDNRTA 1981 12.5 
QSSDLQR 18 8 0 RSDNLTQ 1982 15 
RSDHLSR 1881 RSDTLKQ 1983 2 9 
RSDHLSR 1882 RSDTLTQ 1984 10 

RSDHLSR 18 83 RSDTLKQ 1985 8.33 

RSDHLSR 1884 RSDTLTQ 1986 12.5 

QSGNLAR 1885 RSDALRE 1987 2.07 

QSSALRT 18 86 QSGALRE 1988 2000 

DRSNLER 1887 RGDALTS 1989 112.3 

DSSNLTR 1888 RGDALTS 1990 11.33 
QSAHLAR 1889 RSDNLAR 1991 52 
' RSDNLAR 1890 NRSHLAR 1992 7 
RSDALSR 1891 RSDNLSR 1993 31 

RSDELQR 18 92 RSDHLRE 1994 14.5 

RADTLRR 1893 RSDHLRE 1995 16.5 
DSSNLTR 1894 RGDALTS 1996 37 

DRSNLER 1895 RGDALTS 1997 148.5 
RSDALSR 1896 RSDERKR 1998 6 
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762 


GCGGTGGCTC 1693 


QSSDLQR 


1795 


RSDALSR 1897 


RSDTLKK 1999 


18 


776 


ATGGACGGGT 1694 


RSDHLAR 


1796 


DRSNLER 1898 


RSDSLNQ 2000 


0.4 


111 

lit 


ATGGACGGGT 1695 


RSDHLAR 


1797 


DRSNLTR 1899 


RSDALSA 2001 


3.4 


119 


CGGGGAGCAG 1696 


QSGSLTR 


1798 


QSGHLTR 1900 


RSDHLAE 20 02 


0.5 


780 


CGGGGAGCAG 1697 


QSGSLTR 


1799 


QSGHLTR 1901 


RSDHLRA 2003 


0.5 


781 


GGGGAGCAGC 1698 


RSSNLRE 


1800 


RSDNLAR 1902 


RSDHLTR 2004 


4 .25 


783 


TTGGGAGCGG 1699 


RSDELTR 


1801 


QSGHLQR 1903 


RGDALTS 2005 


2000 


78R 


TTGGGAGCGG 1700 


RSDTLKK 


1802 


QSGHLQR 1904 


RSDALTS 2006 


50 


/ O D 


TTHnOAGCGG 1701 


RSDTLKK 


1803 


QSGHLQR 1905 


RGDALRS 2007 


2000 


787 

/Of 


AGGGAGGATG 1702 


OSDNLAR 


1804 


RSDNLAR 1906 


RSDHLTQ 2008 


4 


826 


GAGGGAGCGA 1703 


RSDELTR 


-1 /\ P" 

18 05 


QSGHLAR 1907 




A * / o 


827 


GAGGGAGCGA 17 04 


RADTLRR 


1806 


QSGHLAR 1908 


RSDNLAR 2010 


1.2 


882 


GCGTGGGCGT 1705 


RSDELTR 


1807 


RSDHLTT 1909 


RSDERKR 2011 


0.01 


883 


GCGTGGGCGT 1706 


RSDELTR 


1808 


RSDHLTT 1910 


RSDERKR 2012 


1 
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TABLE 5 



SEP 



SEP 



SEP 



Kd 



SBS# 



TARGET 



ID 



F2 



ID 



F3 



ID 



(nM) 



ID 



Fl 



903 ATGGAAGGG 2013 RSDHLAR 2513 QSGNLAR 3013 RSDALRQ 3513 1.027 

904 AAGGGTGAC 2014 DSSNLTR 2514 QSSHLAR 3014 RSDNLTQ 3514 1 

905 GTGGTGGTG 2015 RSSALTR 2515 RSDSLAR 3015 RSDSLAR 3515 1.15 

908 AAGGTCTCA 2016 QSGDLRT 2516 DRSALAR 3 016 RSDNLRQ 3516 50 

909 GTGGAAGAA 2017 QSGNLSR 2517 QSGNLQR 3017 RSDALAR 3517 16.4 

910 ATGGAAGAT 2018 QSSNLAR 2518 QSGNLQR 3018 RSDALAQ 3518 0.03 

911 ATGGGTGCA 2019 QSGSLTR 2519 QSSHLAR 3019 RSDALAQ 3519 0.91 

912 TCAGAGGTG 2020 RSDSLAR 2520 RSDNLTR 3020 QSGDLRT 3520 0.135 

914 CAGGAAAAG 2021 RSDNLTQ 2521 QSGNLAR 3021 RSDNLRE 3521 1.26 

915 CAGGAAAAG 2022 RSDNLRQ 2522 QSGNLAR 3022 RSDNLRE 3522 45.15 

916 GAGGAAGGA 2023 QSGHLAR 2523 QSGNLAR 3023 RSDNLQR 3523 1.3 

919 TCATAGTAG 2024 RSDNLTT 2524 RSDNLRT 3024 QSGDLRT 3524 250 

920 GATGTGGTA 2025 QSSSLVR 2525 RSDSLAR 3025 TSANLSR 3525 4 

921 AAGGTCTCA 2026 QSGDLRT 2526 DPGALVR 3026 RSDNLRQ 3526 11 

922 AAGGTCTCA 2027 QSHDLTK 2527 DRSALAR 3027 RSDNLRQ 3527 4 

923 AAGGTCTCA 2028 QSHDLTK 2528 DPGALVR 3028 RSDNLRQ 3528 2 

926 GTGGTGGTG 2029 RSDALTR 252 9 RSDSLAR 3029 RSDSLAR 3529 7.502 

927 CAGGTTGAG 2030 RSDNLAR 253 0 TSGSLTR 3030 RSDNLRE 3530 3.61 

928 CAGGTTGAG 2031 RSDNLAR 2531 QSSALTR 3 031 RSDNLRE 3531 25 

929 CAGGTAGAT 2032 QSSNLAR 2532 QSATLAR 3032 RSDNLRE 3532 1.3 

931 GAGGAAGAG 2033 RSDNLAR 2533 QSSNLVR 3033 RSDNLAR 3533 2 

932 ATGGAAGGG 2034 RSDHLAR 2534 QSSNLVR 3034 RSDALRQ 3534 797 

933 GACGAGGAA 2035 QSANLAR 2535 RSDNLAR 3035 DRSNLTR 3535 500 

934 ATGGAAGAT 2036 QSSNLAR 2536 QSGNLQR 3036 RSDALTS 3536 0.07 

935 ATGGGTGCA 2037 QSGSLTR 2537 QSSHLAR 3037 RSDALTS 3537 0.91 

937 GTGGGGGCT 2038 QSSDLTR 2538 RSDHLTR 3 038 RSDSLAR 3538 0.03 

938 GTGGGGGCT 2039 QSSDLRR 2539 RSDHLTR 3039 RSDSLAR 3539 0.049 

939 GGGGGCTGG 2040 RSDHLTT 2540 DRSHLAR 3040 RSDHLSK 3540 0.352 

940 GGGGGCTGG 2041 RSDHLTK 2541 DRSHLAR 3041 RSDHLSK 3541 1.5 

941 GGGGCTGGG 2042 RSDHLAR 2542 QSSDLRR 3042 RSDKLSR 3542 0.077 

942 GGGGCTGGG 2043 RSDHLAR 2543 QSSDLRR 3043 RSDHLSK 3543 0.13 

943 GGGGCTGGG 2044 RSDHLAR 2544 TSGELVR 3044 RSDKLSR 3544 0.067 

944 GGGGCTGGG 2 045 RSDHLAR 2545 TSGELVR 3 045 RSDHLSK 3545 0.027 

945 GGTGCGGTG 2046 RSDSLTR 2546 RADTLRR 3 046 MSHHLSR 3 546 0.027 

946 GGTGCGGTG 2047 RSDSLTR 2547 RSDVLQR 3 047 MSHHLSR 3547 0.027 

947 GGTGCGGTG 2048 RSDSLTR 2548 RSDELQR 3048 QSSHLAR 3548 0.013 

948 GGTGCGGTG 2049 RSDSLTR 2549 RSDVLQR 3049 QSSHLAR 3549 0.017 

962 GAGGCGGCA 2050 QSGSLTR 2550 RSDELQR 3050 RSDNLAR 3550 0.015 

963 GAGGCGGCA 2051 QSGSLTR 2551 RSDDLQR 3051 RSDNLAR 3551 0.015 

964 GCGGCGGTG 2 052 RSDALAR 2 552 RSDELQR 3 052 RSDERKR 3 552 0.041 
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965 GCGGCGGCC 2053 ERGDLTR 2553 RSDELQR 3053 RSDERKR 3553 3.1 

966 GAGGAGGCC 2054 ERGTLAR 2554 RSDNLSR 3054 RSDNLAR 3554 0.028 

967 GAGGAGGCC 2055 DRSSLTR 2555 RSDNLSR 3055 RSDNLAR 3555 0.055 

968 GAGGCCGCA 2056 QSGSLTR 2556 DRSSLTR 3056 RSDNLAR 3556 1.4 

969 GAGGCCGCA 2057 QSGSLTR 2557 DRSDLTR 3057 RSDNLAR 3557 0.275 

970 GTGGGCGCC 2058 ERGTLAR 2558 DRSHLAR 3058 RSDALAR 3558 1.859 

971 GTGGGCGCC 2059 DRSSLTR 2559 DRSHLAR 3059 RSDALAR 3559 0.144 

972 GTGGGCGCC 2060 ERGDLTR 2560 DRSHLAR 3060 RSDALAR 3560 1.748 

973 GCCGCGGTC 2061 DRS ALTR 2561 RSDELQR 3061 ERGTLAR 3561 0.6 

974 GCCGCGGTC 2062 DRSALTR 2562 RSDELQR 3062 DRSDLTR 3562 0.038 

975 CAGGCCGCT 2063 QSSDLTR 2563 DRSSLTR 3063 RSDNLRE 3563 1.1 

976 CAGGCCGCT 2064 QSSDLTR 2564 DRSDLTR 3064 RSDNLRE 3564 4.12 

977 CTGGCAGTG 2065 RSDSLTR 2565 QSGSLTR 3065 RSDALRE 3565 0.017 

978 CTGGCAGTG 2066 RSDSLTR 2566 QSGDLTR 3066 RSDALRE 3566 1.576 

979 CTGGCGGCG 2067 RSSDLTR 2567 RSDELQR 3067 RSDALRE 3567 1.59 

980 CTGGCGGCG 2068 RSDDLTR 2568 RSDELQR 3068 RSDALRE 3568 2.2 

981 CAGGCGGCG 2069 RSDDLTR 2569 RSDELQR 3069 RSDNLRE 3569 0.375 

982 CCGGGCTGG 2070 RSDHLTT 2570 DRSHLAR 3070 RSDELRE 3570 0.03 

983 CCGGGCTGG 2071 RSDHLTK 2571 DRSHLAR 3071 RSDELRE 3571 1.385 

984 GACGGCGAG 2072 RSDNLAR 2572 DRSHLAR 3072 DRSNLTR 3572 1.6 

985 GACGGCGAG 2073 RSDNLAR 2573 DRSHLAR 3073 EKANLTR 3573 0.965 

986 GGTGCTGAT 2074 QSSNLQR 2574 QSSDLQR 3074 MSHHLSR 3574 1.6 

987 GGTGCTGAT 2075 QSSNLQR 2575 QSSDLQR 3075 TSGHLVR 3575 33.55 

988 GGTGCTGAT 2076 TSGNLVR 2576 QSSDLQR 3076 MSHHLSR 3576 0.15 

989 GGTGAGGGG 2077 RSDHLAR 2577 RSDNLAR 3077 MSHHLSR 3577 1.9 

990 AAGGTGGGC 2078 DRSHLTR 2578 RSDSLAR 3078 RSDNLTQ 3578 5.35 

991 AAGGTGGGC 2079 DRSHLTR 2579 S SGSLVR 3 079 RSDNLTQ 3579 0.06 

993 GGGGCTGGG 2080 RSDHLAR 2580 TSGELVR 3 080 RSDHLSR 3580 3.1 

994 GGGGGCTGG 2081 RSDHLTK 2581 DRSHLAR 3081 RSDHLSR 3581 0.03 

995 GGGGAGGAA 2082 QSANLAR 2582 RSDNLAR 3082 RSDHLSK 3582 0.08 

996 CAGTTGGTC 2083 DRSALAR 2583 RSDALTS 3083 RSDNLRE 3583 9.6 

997 AGAGAGGCT 2084 QSSDLTR 2584 RSDNLAR 3084 QSGHLNQ 3584 1.65 

998 ACGTAGTAG 2085 RSANLRT 2585 RSDNLTK 3085 RSDTLKQ 3585 0.23 

999 AGAGAGGCT 2086 QSSDLTR 2586 RSDNLAR 3086 QSGKLTQ 3586 0.6 

1000 CAGTTGGTC 2087 DRSALAR 2587 RSDALTR 3 087 RSDNLRE 3587 11.15 

1001 GGAGCTGAC 2088 EKANLTR 2588 QSSDLSR 3088 QRAHLAR 3588 1.8 

1002 GCGGAGGAG 2089 RSDNLVR 2589 RSDNLAR 3089 RSDERKR 3589 0.028 

1003 ACGTAGTAG 2090 RSANLRT 2590 RSDNLTK 3090 RSDTLRS 3590 0.118 

1004 ACGTAGTAG 2091 RSDNLTT 2591 RSDNLTK 3091 RSDTLRS 3591 1.4 

1006 GTAGGGGCG 2092 RSDDLTR 2592 RSDHLTR 3 092 QRASLTR 3592 0.898 

1007 GAGAGAGAT 2093 QSSNLQR 2593 QSGHLTR 3093 RLHNLAR 3593 167 

1008 GAGATGGAG 2094 RSDNLSR 2594 RSDSLTQ 3094 RLHNLAR 3594 0.4 

1009 GAGATGGAG 2095 RSDNLSR 2595 RSDSLTQ 3095 RSDNLSR 3595 1.9 

1010 GAGAGAGAT 2096 QSSNLQR 2596 QSGHLTR 3096 RSDNLAR 3596 8.2 

1011 TTGGTGGCG 2097 RSADLTR 2597 RSDSLAR 3097 RSDSLTK 3597 0.03 

1012 GACGTAGGG 2098 RSDHLTR 2598 QSSSLVR 3098 DRSNLTR 3598 0.032 

1013 GAGAGAGAT 2099 QSSNLQR 2599 QSGHLNQ 3099 RSDNLAR 3599 0.15 
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1014 GACGTAGGG 2100 RSDHLTR 2600 QSGSLTR 3100 DRSNLTR 3600 0.01 

1015 GCGGAGGAG 2101 RSDNLVR 2601 RSDNLAR 3101 RSDTLKK 3601 0.008 

1016 CAGTTGGTC 2102 DRSALAR 2602 RSDSLTK 3102 RSDNLRE 3602 0.09 

1017 CTGGATGAC 2103 EKANLTR 2603 TSGNLVR 3103 RSDALRE 3603 0.233 

1018 GTAGTAGAA 2104 QSANLAR 2604 QSSSLVR 3104 QRASLAR 3604 7.2 

1019 AGGGAGGAG 2105 RSDNLAR 2605 RSDNLAR 3105 RSDHLTQ 3605 0.022 

1020 ACGTAGTAG 2106 RSDNLTT 2606 RSDNLTK 3106 RSDTLKQ 3606 0.69 
1022 GAGGAGGTG 2107 RSDALAR 2607 RSDNLAR 3107 RSDNLAR 3607 0.01 

1024 GGGGAGGAA 2108 QSANLAR 2608 RSDNLAR 3108 RSDHLSR 3608 0.08 

1025 GAGGAGGTG 2109 QSSALTR 2609 QSSSLVR 3109 RSDTLTQ 3609 0.115 

1026 GTGGCTTGT 2110 MSHHLKE 2610 QSSDLSR 3110 RSDALAR 3610 0.076 

1027 GCGGCGGTG 2111 RSDALAR 2611 RSDELQR 3111 RSDELQR 3611 0.054 

1032 GGTGCTGAT 2112 TSGNLVR 2612 QSSDLQR 3112 TSGHLVR 3612 0.52 

1033 GTGTTCGTG 2113 RSDALAR 2613 DRSALTT 3113 RSDALAR 3613 685.2 

1034 GTGTTCGTG 2114 RSDALAR 2614 DRSALTK 3114 RSDALAR 3614 14.55 

1035 GTGTTCGTG 2115 RSDALAR 2615 DRSALRT 3115 RSDALAR 3615 56 

1037 GTAGGGGCA 2116 QSGSLTR 2616 RSDHLSR 3116 QRASLAR 3616 0.05 

1038 GTAGGGGCA 2117 QTGELRR 2617 RSDHLSR 3117 QRASLAR 3617 0.152 

1039 GGGGCTGGG 2118 RSDHLSR 2618 TSGELVR 3118 RSDHLTR 3618 1.37 
104 0 GGGGCTGGG 2119 RSDHLSR 2619 QSSDLQR 3119 RSDHLSK 3619 0.05 
1041 TCATAGTAG 2120 RSDNLTT 2620 RSDNLRT 3120 QSHDLTK 3620 2.06 

1043 CAGGGAGAG 2121 RSDNLAR 2621 QSGHLTR 3121 RSDNLRE 3621 0.16 

1044 CAGGGAGAG 2122 RSDNLAR 2622 QRAHLER 3122 RSDNLRE 3622 1.07 

1045 GGGGCAGGA 2123 QSGHLAR 2623 QSGSLTR 3123 RSDHLSR 3623 0.15 

1046 GGGGCAGGA 2124 QSGHLAR 2624 QSGDLRR 3124 RSDHLSR 3624 0.09 

1047 GGGGCAGGA 2125 QRAHLER 2625 QSGSLTR 3125 RSDHLSR 3625 24.7 

1048 CAGGCTGTA 2126 QSGALTR 2626 QSSDLQR 3126 RSDNLRE 3626 1.387 

1049 CAGGCTGTA 2127 QRASLAR 2627 QSSDLQR 3127 RSDNLRE 3627 55.6 

1050 CAGGCTGTA 2128 QSSSLVR 2628 QSSDLQR 3128 RSDNLRE 3628 0.125 

1051 GAGGCTGAG 2129 RSDNLTR 262 9 QSSDLQR 3129 RSDNLVR 3629 0.02 

1052 TAGGACGGG 2130 RSDHLAR 2 630 EKANLTR 3130 RSDNLTT 3630 0.28 

1053 TAGGACGGG 2131 RSDHLAR 2631 DRSNLTR 3131 RSDNLTT 3631 0.025 

1054 GCTGCAGGG 2132 RSDHLAR 2632 QSGSLTR 3132 QSSDLQR 3632 0.033 

1055 GCTGCAGGG 2133 RSDHLAR 2 63 3 QSGSLTR 3133 TSGDLTR 3633 18.73 

1056 GCTGCAGGG 2134 RSDHLAR 2634 QSGSLTR 3134 QSSDLQR 3 634 0.045 

1057 GCTGCAGGG 2135 RSDHLAR 2 635 QSGDLTR 313 5 TSGDLTR 3 635 0.483 

1058 GGGGCCGCG 2136 RSDELTR 2 63 6 DRSSLTR 3136 RSDHLSR 3636 6.277 

1059 GGGGCCGCG 2137 RSDELTR 2637 DRSDLTR 3137 RSDHLSR 3637 0.152 

1060 GCGGAGGCC 2138 ERGTLAR 2 638 RSDNLAR 3138 RSDERKR 3 638 0.69 

1061 GTTGCGGGG 2139 RSDHLAR 263 9 RSDELQR 3139 QSSALTR 363 9 0.165 

1062 GTTGCGGGG 2140 RSDHLAR 2640 RSDELQR 3140 TSGSLTR 3 640 0.068 

1063 GTTGCGGGG 2141 RSDHLAR 2641 RSDELQR 3141 MSHALSR 3 641 0.96 

1064 GCGGCAGTG 2142 RSDALTR 2 642 QSGSLTR 3142 RSDERKR 3642 0.453 

1065 TGGGGCGGG 2143 RSDHLAR 2 643 DRSHLAR 3143 RSDHLTT 3643 1.37 

1066 GAGGGCGGT 2144 QSSHLTR 2644 DRSHLAR 3144 RSDNLVR 3644 0.15 

1067 GAGGGCGGT 2145 TSGHLVR 2 645 DRSHLAR 3145 RSDNLVR 3 645 1.37 

1068 GCAGGGGGC 2146 DRSHLTR 264 6 RSDHLTR 3146 QSGDLTR 3646 2.05 
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1069 GCAGGCGGT 2147 DRSHLTR 2647 RSDHLTR 3147 QSGSLTR 3647 0.1 

1070 GGGGCAGGC 2148 DRSHLTR 2648 QSGSLTR 3148 RSDHLSR 3648 0.456 

1071 GGGGCAGGC 2149 DRSHLTR 2649 QSGDLTR 3149 RSDHLSR 3649 0.2 

1072 GGATTGGCT 2150 QSSDLTR 2650 RSDALTT 3150 QRAHLAR 3650 0.46 

1073 GGATTGGCT 2151 QSSDLTR 2651 RSDALTK 3151 QRAHLAR 3 651 1.37 

1075 GTGTTGGCG 2152 RSDELTR 2652 RSDALTK 3152 RSDALTR 3652 0.915 

1076 GCGGCAGCG 2153 RSDELTR 2653 QSGSLTR 3153 RSDERKR 3653 4.1 

1077 GCGGCAGCG 2154 RSDELTR 2654 QSGDLRR 3154 RSDERKR 3654 6.2 

1078 GGGGGGGCC 2155 ERGTLAR 2655 RSDHLSR 3155 RSDHLSR 3655 0.2 

1079 GGGGGGGCC 2156 ERGDLTR 2656 RSDHLSR 3156 RSDHLSR 3656 4.1 

1080 CTGGAGGCG 2157 RSDELTR 2657 RSDNLAR 3157 RSDALRE 3657 1.37 

1081 GGGGAGGTG 2158 RSDALTR 2658 RSDNLTR 3158 RSDHLSR 3658 0.05 

1082 CTGGCGGCG 2159 RSDELTR 2659 RSDELTR 3159 RSDALRE 3659 0.152 

1083 CTGGTGGCA 2160 QSGDLTR 2660 RSDALSR 3160 RSDALRE 3660 0.152 

1084 GGTGAGGCG 2161 RSDELTR 2661 RSDNLAR 3161 MSHHLSR 3661 0.5 

1085 GGTGAGGCG 2162 RSDELTR 2662 RSDNLAR 3162 QSSHLAR 3662 0.46 

1086 GGGGCTGGG 2163 RSDHLSR 2663 QSSDLQR 3163 RSDHLTR 3663 0.1 

1087 CGGGCGGCC 2164 ERGDLTR 2664 RSDELQR 3164 RSDHLAE 3664 1.24 

1088 CGGGCGGCC 2165 ERGDLTR 2665 RSDELQR 3165 RSDHLRE 3665 0.905 

1089 GACGAGGCT 2166 QSSDLRR 2666 RSDNLAR 3166 DRSNLTR 3666 0.171 

1090 AAGGCGCTG 2167 RSDALRE 2667 RSDELQR 3167 RSDNLTQ 3667 30.3 

1091 GTAGAGGAC 2168 DRSNLTR 2668 RSDNLAR 3168 QRASLAR 3668 0.085 

1092 GCCTTGGCT 2169 QSSDLRR 2669 RGDALTS 3169 DRSDLTR 3669 2.735 
10 93 GCGGAGTCG 2170 RSADLRT 2670 RSDNLAR 3170 RSDERKR 3670 0.046 

1094 GCGGTTGGT 2171 TSGHLVR 2671 QSSALTR 3171 RSDERKR 3671 12.34 

1095 GGGGGAGCC 2172 ERGDLTR 2672 QRAHLER 3172 RSDHLSR 3672 0.395 

1096 GGGGGAGCC 2173 DRSSLTR 2673 QRAHLER 3173 RSDHLSR 3673 0.019 

1097 GAGGCCGAA 2174 QSANLAR 2674 DCRDLAR 3174 RSDNLAR 3674 0.77 

1098 GCCGGGGAG 2175 RSDNLTR 2675 RSDHLTR 3175 DRSDLTR 3675 0.055 

1099 GCGGAGTCG 2176 TSGHLVR 2676 TSGSLTR 3176 RSDERKR 3676 0.45 

1100 GTGTTGGTA 2177 QSGALTR 2677 RGDALTS 3177 RSDALTR 3677 1.4 

1101 ATGGGAGTT 2178 TTSALTR 2678 QRAHLER 3178 RSDALRQ 3678 0.065 

1102 AAGGCAGAA 2179 QSANLAR 2679 QSGSLTR 3179 RSDNLTQ 3679 8.15 

1103 AAGGCAGAA 2180 QSANLAR 2680 QSGDLTR 3180 RSDNLTQ 3 680 1.4 

1104 CGGGCAGCT 2181 QSSDLRR 2681 QSGSLTR 3181 RSDHLRE 3681 0.08 

1105 CTGGCAGCC 2182 ERGDLTR 2682 QSGDLTR 3182 RSDALRE 3682 2.45 

1106 CTGGCAGCC 2183 DRSSLTR 2683 QSGDLTR 3183 RSDALRE 3 683 0.19 

1107 GCGGGAGTT 2184 QSSALAR 2 684 QRAHLER 3184 RSDERKR 3684 0.06 

1108 CAGGCTGGA 2185 QSGHLAR 2685 TSGELVR 3185 RSDNLRE 3685 0.007 

1109 AGGGGAGCC 2186 ERGDLTR 2686 QRAHLER 3186 RSDHLTQ 3 686 0.347 

1110 AGGGGAGCC 2187 DRSSLTR 2687 QRAHLER 3187 RSDHLTQ 3687 0.095 

1111 CTGGTAGGG 2188 RSDHLAR 2688 QSSSLVR 3188 RSDALRE 3688 0.095 

1112 CTGGTAGGG 2189 RSDHLAR 2689 QSATLAR 3189 RSDALRE 3689 0.125 

1113 CTGGGGGCA 2190 QSGDLTR 2690 RSDHLTR 3190 RSDALRE 3690 0.06 

1114 CAGGTTGAT 2191 QSSNLAR 2691 TSGSLTR 3191 RSDNLRE 3691 2.75 

1115 CAGGTTGAT 2192 QSSNLAR 2692 QSSALTR 3192 RSDNLRE 3692 0.7 

1116 CCGGAAGCG 2193 RSDELTR 2693 QSSNLVR 3193 RSDELRE 3693 12.3 
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1117 GCAGCGCAG 2194 RSSNLRE 2694 RSDELTR 3194 QSGSLTR 3694 2.85 

1118 TAGGGAGTC 2195 DRSALTR 2695 QRAHLER 3195 RSDNLTT 3695 1.4 

1119 TGGGAGGGT 2196 TSGHLVR 2696 RSDNLAR 3196 RSDHLTT 3696 0.1 

1120 AGGGACGCG 2197 RSDELTR 2697 DRSNLTR 3197 RSDHLTQ 3697 2.735 

1121 CTGGTGGCC 2198 ERGDLTR 2698 RSDALTR 3198 RSDALRE 3698 2.76 

1122 CTGGTGGCC 2199 DRSSLTR 2699 RSDALTR 3199 RSDALRE 3699 0.101 

1123 TAGGAAGCA 2200 QSGSLTR 2700 QSGNLAR 32 00 RSDNLTT 3700 0.065 

1124 GTGGATGGA 2201 QSGHLAR 27 01 TSGNLVR 3201 RSDALTR 3701 0.101 

1126 TTGGCTATG 2202 RSDALTS 2702 TSGELVR 3202 RGDALTS 3702 0.46 

1127 CAGGGGGTT 2203 QSSALAR 2703 RSDHLTR 3203 RSDNLRE 3703 0.1 

1128 AAGGTCGCC 2204 ERGDLTR 2704 DPGALVR 32 04 RSDNLTQ 3704 5.45 

1130 GGTGCAGAC 2205 DRSNLTR 2705 QSGDLTR 32 05 MSHHLSR 3 7 0 5 0.1 

1131 GTGGGAGCC 2206 ERGDLTR 2706 QRAHLER 3206 RSDALTR 3706 0.95 

1132 GGGGCTGGA 2207 QSGHLAR 2707 TSGELVR 3207 RSDHLSR 3707 0.055 

1133 GGGGCTGGA 2208 QRAHLER 2708 TSGELVR 3208 RSDHLSR 3708 0.5 

1134 TGGGGGTGG 2209 RSDHLTT 2709 RSDHLTR 3209 RSDHLTT 3709 0.067 

1135 GCGGCGGGG 2210 RSDHLAR 2710 RSDELQR 3210 RSDERKR 3710 0.025 

1136 CCGGGAGTG 2211 RSDALTR 2711 QRAHLER 3211 RSDTLRE 3711 0.225 

1137 CCGGGAGTG 2212 RSSALTR 2712 QRAHLER 3212 RSDTLRE 3712 0.085 

1138 CAGGGGGTA 2213 QSGALTR 2713 RSDHLTR 3213 RSDNLRE 3713 0.027 

1139 ACGGCCGAG 2214 RSDNLAR 2714 DRSDLTR 3214 RSDTLTQ 3714 0.535 

1140 AAGGGTGCG 2215 RSDELTR 2715 QSSHLAR 3215 RSDNLTQ 3715 0.3 

1141 ATGGACTTG 2216 RGDALTS 2716 DRSNLTR 3216 RSDALTQ 3716 1.7 

1148 TTGGAGGAG 2217 RSDNLTR 2717 RSDNLTR 3217 RGDALTS 3717 0.006 

1149 TTGGAGGAG 2218 RSDNLTR 2718 RSDNLTR 3218 RSDALTK 3718 0.004 

1150 GAAGAGGCA 2219 QSGSLTR 2719 RSDNLTR 3219 QSGNLTR 3719 0.004 

1151 GTAGTATGG 2220 RSDHLTT 2720 QRSALAR 3220 QRASLAR 3720 1.63 

1152 AAGGCTGGA 2221 QSGHLAR 2721 TSGELVR 3221 RSDNLTQ 3721 1.605 

1153 AAGGCTGGA 2222 QRAHLAR 2722 TSGELVR 3222 RSDNLTQ 3722 8.2 

1154 CTGGCGTAG 2223 RSDNLTT 2723 RSDELQR 3223 RSDALRE 3723 1.04 

1156 ATGGTTGAA 2224 QSANLAR 2724 QSSALTR 3224 RSDALRQ 3724 7.2 

1157 ATGGTTGAA 2225 QSANLAR 2725 TSGSLTR 3225 RSDALRQ 3725 0.885 

1158 AGGGGAGAA 2226 QSANLAR 2726 QSGHLTR 3226 RSDHLTQ 3726 0.1 

1159 AGGGGAGAA 2227 QSANLAR 2727 QRAHLER 3227 RSDHLTQ 3727 0.555 

1160 TGGGAAGGC 2228 DRSHLAR 2728 QSSNLVR 3228 RSDHLTT 3728 0.415 

1161 GAGGCCGGC 2229 DRSHLAR 2729 DRSDLTR 3229 RSDNLAR 3729 0.45 

1162 GTGTTGGTA 2230 QSGALTR 2730 RADALMV 3230 RSDALTR 3730 0.465 

1163 GTGTGAGCC 2231 ERGDLTR 2731 QSGHLTT 3231 RSDALTR 3731 1.45 

1164 GTGTGAGCC 2232 ERGDLTR 2732 QSVHLQS 3232 RSDALTR 3732 15.4 

1165 GCGAAGGTG 2233 RSDALTR 2733 RSDNLTQ 3233 RSDERKR 3733 1.4 

1166 GCGAAGGTG 2234 RSDALTR 2734 RSDNLTQ 3234 RSSDRKR 3734 0.195 

1167 GCGAAGGTG 2235 RSDALTR 2735 RSDNLTQ 3235 RSHDRKR 3735 0.95 

1168 AAGGCGCTG 2236 RSDALRE 2736 RSSDLTR 3236 RSDNLTQ 3736 2.8 

1169 GTAGAGGAC 2237 DRSNLTR 2737 RSDNLAR 3237 QSSSLVR 3737 0.053 

1170 GCCTTGGCT 2238 QSSDLRR 2738 RADALMV 3238 DRSDLTR 3738 2.75 

1171 GCGGAGTCG 2239 RSDDLRT 2739 RSDNLAR 3239 RSDERKR 3739 0.18 

1172 GCCGGGGAG 2240 RSDNLTR 2740 RSDHLTR 3240 ERGDLTR 3740 0.01 
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1173 GCTGAAGGG 2241 RSDHLSR 2741 QSGNLAR 3241 QSSDLRR 3741 0.008 

1174 GCTGAAGGG 2242 RSDHLSR 2742 QSSNLVR 3242 QSSDLRR 3742 0.018 

1175 AAGGTCGCC 2243 DRSDLTR 2743 DPGALVR 3243 RSDNLTQ 3743 8.9 

1176 GTGGGAGCC 2244 DRSDLTR 2744 QRAHLER 3244 RSDALTR 3744 4.1 

1177 CCGGGCGCA 2245 QSGSLTR 2745 DRSHLAR 3245 RSDTLRE 3745 4.1 

1178 GAGGATGGC 2246 DRSHLAR 2746 TSGNLVR 3246 RSDNLAR 3746 0.085 

1179 GCAGCGCAG 2247 RSSNLRE 2747 RSSDLTR 3247 QSGSLTR 3747 2.735 

1180 AAGGAAAGA 2248 QSGHLNQ 2748 QSGNLAR 3248 RSDNLTQ 3748 4.825 

1181 TTGGCTATG 2249 RSDALRQ 2749 TSGELVR 3249 RGDALTS 3749 8.2 

1182 CAGGAAGGC 2250 DRSHLAR 2750 QSGNLAR 3250 RSDNLRE 3750 1.48 

1183 CAGGAAGGC 2251 DRSHLAR 2751 QSSNLVR 3251 RSDNLRE 3751 1.935 

1184 AAGGAAAGA 2252 KNWKLQA 2752 QSGNLAR 3252 RSDNLTQ 3752 2.785 

1185 AAGGAAAGA 2253 KNWKLQA 2753 QSHNLAR 3253 RSDNLTQ 3753 5.25 

1186 GCCGAGGTG 2254 RSDSLLR 2754 RSKNLQR 3254 ERGTLAR 3754 27.5 

1187 CTGGTGGGC 2255 DRSHLAR 2755 RSDALTR 3255 RSDALRE 3755 0.006 

1188 GTAGTATGG 2256 RSDHLTT 2756 QSSSLVR 3256 QRASLAR 3756 2.74 

1189 ATGGTTGAA 2257 QSANLAR 2757 TSGALTR 3257 RSDALRQ 3757 1.51 

1190 ATGGCAGTG 2258 RSDALTR 2758 QSGDLTR 3258 RSDSLNQ 3758 1.484 

1191 ATGGCAGTG 2259 RSDALTR 2759 QSGSLTR 32 59 RSDSLNQ 3759 5.325 

1192 ATGGCAGTG 2260 RSDALTR 2760 QSGDLTR 3260 RSDALTQ 3760 2.364 

1193 ATGGCAGTG 2261 RSDALTR 2761 QSGSLTR 3261 RSDALTQ 3761 3.125 

1194 GAGAAGGTG 2262 RSDALTR 2762 RSDNRTA 3262 RSDNLTR 3762 2.19 

1195 GAGAAGGTG 2263 RSDALTR 2763 RSDNRTA 3263 RSSNLTR 3763 2.8 
1197 GAAGGTGCC 2264 ERGDLTR 2764 MSHHLSR 3264 QSGNLTR 3764 14.8 

1199 ATGGAGAAG 2265 RSDNRTA 27 65 RSDNLTR 32 65 RSDALTQ 3765 3.428 

1200 ATGGAGAAG 2266 RSDNRTA 2766 RSSNLTR 3266 RSDALTQ 3766 16.87 

1201 ATGGAGAAG 2267 RSDNRTA 2767 RSHNLTR 3267 RSDALTQ 3767 14.8 

1202 CTGGAGTAC 2268 DRSNLRT 2768 RSDNLTR 3268 RSDALRE 3768 2.834 

1203 GGAGTACTG 2269 RSDALRE 2769 QRSALAR 3269 QRAHLAR 3769 2.945 

1204 GGAGTACTG 2270 RSDALRE 2770 QSSSLVR 3270 QRAHLAR 3770 4.38 

1205 CGGGCAGCT 2271 QSSDLRR 2771 QSGDLTR 3271 RSDHLRE 3771 0.9 

1206 GCGGGAGTT 2272 TTSALTR 2772 QRAHLER 3272 RSDERKR 3772 0.034 

1207 CAGGCTGGA 2273 QRAHLER 2773 TSGELVR 3273 RSDNLRE 3773 0.45 
1209 CCGGAAGCG 2274 RSDELTR 2774 QSSNLVR 3274 RSDTLRE 3774 19.28 

1211 GCAGCGCAG 2275 RSDNLRE 2775 RSDELTR 3275 QSGSLTR 3775 6.5 

1212 CAGGGGGTT 2276 TTSALTR 2776 RSDHLTR 3276 RSDNLRE 3776 0.05 

1213 GAAGAAGAG 2277 RSDNLTR 2777 QSSNLVR 3277 QSGNLTR 3777 12.3 

1214 ATGGGAGTT 2278 TTSALTR 2778 QRAHLER 3278 RSDALTQ 3778 0.46 

1215 GTGGGGGCT 2279 QSSDLRR 2779 RSDHLTR 3279 RSDALTR 3779 0.003 

1217 GAAGAGGCA 2280 QSGSLTR 2780 RSDNLTR 3280 QSANLTR 3780 0.004 

1218 GCGGTGAGG 2281 RSDHLTQ 2781 RSQALTR 3281 RSDERKR 3781 0.46 

1219 AAGGAAAGG 2282 RSDHLTQ 2782 QSHNLAR 3282 RSDNLTQ 3782 0.68 

1220 AAGGAAAGG 2283 RSDHLTQ 2783 QSGNLAR 3283 RSDNLTQ 3783 0.175 

1221 AAGGAAAGG 2284 RSDHLTQ 2784 QSSNLVR 3284 RSDNLTQ 3784 1.4 

1222 CAGGAGGGC 2285 DRSHLAR 2785 RSDNLAR 3285 RSDNLRE 3785 0.155 

1223 ATGGACTTG 2286 RSDALTK 2786 DRSNLTR 3286 RSDALTQ 3 786 7 

1224 ATGGACTTG 2287 RADALMV 2 78 7 DRSNLTR 3287 RSDALTQ 3787 12 
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1227 GAATAGGGG 2288 RSDHLSR 2788 RSDHLTK 3288 QSGNLAR 3788 25 

1228 ACGGCCGAG 2289 RSDNLAR 2789 DRSDLTR 3289 RSDDLTQ 3789 12 

1229 AAGGGTGCG 22 90 RSDELTR 2790 MSHHLSR 32 90 RSDNLTQ 3790 8.2 

1230 AAGGGAGAC 2291 DRSNLTR 2791 QSGHLTR 3291 RSDNLTQ 3791 0.383 

1231 AAGGGAGAC 2292 DRSNLTR 2792 QRAHLER 3292 RSDNLTQ 3792 0.213 

1232 TGGGACCTG 2293 RSDALRE 2793 DRSNLTR 3293 RSDHLTT 3793 0.113 

1233 TGGGACCTG 2294 RSDALRE 2794 DRSNLTR 3294 RSDHLTT 3794 0.635 

1234 GAGTAGGCA 2295 QSGSLTR 2795 RSDNLTK 3295 RSDNLAR 3 795 0.101 

1236 GAGTAGGCA 2296 QSGSLTR 2796 RSDHLTT 3296 RSDNLAR 3796 0.065 

1237 GAAGGAGAG 2297 RSDNLAR 2797 QRAHLER 3297 QSGNLAR 3797 0.065 
123 8 CTGGATGTT 2298 QSSALAR 2798 TSGNLVR 3298 RSDALRE 3798 0.313 

1239 CAGGACGTG 2299 RSDALTR 2799 DPGNLVR 3299 RSDNLKD 3 799 0.144 

1240 GGGGAGGCA 2300 QSGSLTR 2800 RSDNLTR 3300 RSDHLSR 3800 0.056 

1241 GAGGTGTCA 2301 QSHDLTK 2801 RSDALAR 3301 RSDNLAR 3801 0.027 

1242 GGGGTTGAA 2302 QSANLAR 2802 TSGSLTR 3302 RSDHLSR 3802 0.02 

1243 GGGGTTGAA 2303 QSANLAR 2803 QSSALTR 3303 RSDHLSR 3803 0.101 

1244 GTCGCGGTG 2304 RSDALTR 2804 RSDELQR 3304 DRSALAR 3804 0.044 
t 1245 GTCGCGGTG 2305 RSDALTR 2805 RSDELQR 3305 DSGSLTR 3 805 0.102 
1 1246 GTGGTTGCG 2306 RSDELTR 2806 TSGSLTR 3306 RSDALTR 3 806 0.051 
i 1247 GTGGTTGCG 2307 RSDELTR 2 807 TSGALTR 3307 RSDALTR 3807 0.117 
% 1248 GTCTAGGTA 2308 QSGALTR 2 808 RSDNLTT 3308 DRSALAR 3808 5.14 
1 1249 CCGGGAGCG 2309 RSDELTR 2809 QSGHLTR 3309 RSDTLRE 3 809 0.26 
3 1250 GAAGGAGAG 2310 RSDNLAR 2810 QSGHLTR 3310 QSGNLAR 3810 0.31 
E 1252 CCGGCTGGA 2311 QRAHLER 2811 QSSDLTR 3311 RSDTLRE 3811 0.153 

1253 CCGGGAGCG 2312 RSDELTR 2812 QRAHLER 3312 RSDTLRE 3812 0.228 

* 1255 ACGTAGTAG 2313 RSDNLTT 2813 RSDNLTK 3313 RSDTLKQ 3813 0.69 

* 1256 GGGGAGGAT 2314 QSSNLAR 2814 RSDNLQR 3314 RSDHLSR 3 814 2 
If 1257 GGGGAGGAT 2315 TTSNLAR 2815 RSDNLQR 3315 RSDHLSR 3815 1 
\ 1258 GGGGAGGAT 2316 QSSNLRR 2816 RSDNLQR 3316 RSDHLSR 3816 2 
? 1259 GAGTGTGTG 2317 RSDSLLR 2817 DRDHLTR 3317 RSDNLAR 3817 1.5 
™ 1260 GAGTGTGTG 2318 RLDSLLR 2818 DRDHLTR 3318 RSDNLAR 3818 1.8 

1261 TGCGGGGCA 2319 QSGDLTR 2819 RSDHLTR 3319 RRDTLHR 3819 0.2 

1262 TGCGGGGCA 2320 QSGDLTR 2820 RSDHLTR 3320 RLDTLGR 3820 3 

1263 TGCGGGGCA 2321 QSGDLTR 2821 RSDHLTR 3321 DSGHLAS 3821 21 

1264 AAGTTGGTT 2322 TTSALTR 2822 RADALMV 3322 RSDNLTQ 3822 0.21 

1265 AAGTTGGTT 2323 TTSALTR 2823 RSDALTT 3323 RSDNLTQ 3823 0.077 

1266 CAGGGTGGC 2324 DRSHLTR 2824 QSSHLAR 3324 RSDNLRE 3824 6.1 

1267 TAGGCAGTC 2325 DRSALTR 2825 QSGSLTR 3325 RSDNLTT 3825 6 

1268 CTGTTGGCT 2326 QSSDLTR 2826 RADALMV 3326 RSDALRE 3826 1.52 

1269 CTGTTGGCT 2327 QSSDLTR 2827 RSDALTT 3327 RSDALRE 3827 12.3 

1270 TTGGATGGA 2328 QSGHLAR 2828 TSGNLVR 3328 RSDALTK 3828 0.4 

1271 GTGGCACTG 2329 RSDALRE 2829 QSGSLTR 3329 RSDALTR 3829 0.915 

1272 CAGGAGTCC 2330 DRSSLTT 2830 RSDNLAR 3330 RSDNLRE 3830 0.04 

1273 CAGGAGTCC 2331 ERGDLTT 2831 RSDNLAR 3 3 3 1 RSDNLRE 3831 0.1 

1274 GCATGGGAA 2332 QSANLSR 2832 RSDHLTT 3332 QSGSLTR 3832 0.306 

1275 GCATGGGAA 2333 QRSNLVR 2833 RSDHLTT 3333 QSGSLTR 3833 0.326 

1276 TAGGAAGAG 2334 RSDNLAR 2834 QRSNLVR 3334 RSDNLTT 3834 0.685 
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1277 GAAGAGGGG 2335 RSDHLAR 2835 RSDNLAR 3335 QSGNLTR 3835 0.421 

1278 GAGTAGGCA 2336 QSGSLTR 2836 RSDNLRT 3336 RSDNLAR 3836 0.019 

1279 GAGGTGTCA 2337 QSGDLRT 2837 RSDALAR 3337 RSDNLAR 3837 0.025 
1282 TCGGTCGCC 2338 ERGDLTR 2838 DPGALVR 3338 RSDELRT 3838 74.1 

1287 GTGGTAGGA 2339 QSGHLAR 2839 QSGALAR 3339 RSDALTR 3839 0.152 

1288 CAGGGTGGC 2340 DRSHLTR 2840 QSSHLAR 3340 RSDNLTE 3840 4.1 
12 89 TAGGCAGTC 2341 DRSALTR 2841 QSGSLTR 3341 RSDNLTK 3841 1.37 

1290 GTGGTGATA 2342 QSGALTQ 2842 RSHALTR 3342 RSDALTR 3842 24.05 

1291 GTGGTGATA 2343 QQASLNA 2843 RSHALTR 3343 RSDALTR 3843 20.55 

1292 TTGGATGGA 2344 QSGHLAR 2844 TSGNLVR 3344 RSDALTT 3844 4.12 
12 93 AAGGTAGGT 2345 TSGHLVR 2845 QSGALAR 3345 RSDNLTQ 3845 0.457 
12 94 AAGGTAGGT 2346 MSHHLSR 2846 QSGALAR 3346 RSDNLTQ 3846 2.75 
1295 CAGGAGTCC 2347 DRSSLTT 2847 RSDNLAR 3347 RSDNLTE 3847 0.116 
12 96 CAGGAGTCC 2348 ERGDLTT 2848 RSDNLAR 3348 RSDNLTE 3848 37 
1297 TAGGAAGAG 2349 RSDNLAR 2849 QRSNLVR 3349 RSDNLTK 3849 0.05 
12 98 CAGGACGTG 2350 RSDLATR 2850 DPGNLVR 3350 RSDNLTE 3850 0.05 
1300 GTCTAGGTA 2351 QSGALTR 2851 RSDNLTK 3351 DRSALAR 3851 0.46 

O 1302 CCGGCTGGA 2352 QSGHLTR 2852 QSSDLTR 3352 RSDTLRE 3852 0.05 

'5 13 03 TAGGAGTTT 2353 QRS ALAS 2853 RSDNLAR 3353 RSDNLTT 3853 0.088 

1306 CTGGCCTTG 2354 RSDALTT 2854 DCRDLAR 3354 RSDALRE 3854 2.285 
% 13 08 TGGGCAGCC 2355 ERGTLAR 2855 QSGSLTR 3355 RSDHLTT 3855 0.305 

'% 13 09 TAGGAGTTT 2356 QSSALAS 2856 RSDNLAR 3356 RSDNLTT 3856 0.184 

Jj 1310 TAGGAGTTT 2357 TTS ALAS 2857 RSDNLAR 3357 RSDNLTT 3857 0.075 

J= 1311 TGGGCAGCC 2358 ERGDLAR 2858 QSGSLTR 3358 RSDHLTT 3858 0.91 

1312 GGGGCGTGA 2359 QSGHLTK 2859 RSDELQR 3359 RSDHLSR 3859 0.23 
M 1313 GGGGCGTGA 2360 QSGHLTT 2860 RSDELQR 3360 RSDHLSR 3860 0.09 

U 1314 GTACAGTAG 2361 RSDNLTT 2861 RSDNLRE 3361 QSSSLVR 3861 3.09 

Hi 1315 GTACAGTAG 2362 RSDNLTT 2862 RSDNLTE 3362 QSSSLVR 3862 9.27 

O 1318 ATGGTGTGT 2363 TSSHLAS 2863 RSDALAR 3363 RSDALAQ 3863 0.048 

O 1319 ATGGTGTGT 2364 MSHHLTT 2864 RSDALAR 3364 RSDALAQ 3864 0.228 

^ 1320 TTGGGAGAG 2365 RSDNLAR 2865 QRAHLER 3365 RSDALTT 3865 0.044 

1321 TTGGGAGAG 2366 RSDNLAR 2866 QRAHLER 3366 RADALMV 3866 0.127 

1322 GTGGGAATA 2367 QSGALTQ 2867 QSGHLTR 3367 RSDALTR 3867 0.799 

1323 GTGGGAATA 2368 QLTGLNQ 2868 QSGHLTR 3368 RSDALTR 3868 0.744 

1324 GTGGGAATA 2369 QQASLNA 2869 QSHHLTR 3369 RSDALTR 3869 18.52 

1325 TTGGTTGGT 2370 TSGHLVR 2870 TSGSLTR 3370 RSDALTK 3870 0.306 

1326 TTGGTTGGT 2371 TSGHLVR 2871 QSSALTR 3371 RSDALTK 3871 4.385 

1327 TTGGTTGGT 2372 TSGHLVR 2872 TSGSLTR 3372 RSDALTT 3872 0.566 

1328 TTGGTTGGT 2373 TSGHLVR 2873 QSSALTR 3373 RSDALTT 3873 7.95 

1329 CTGGCCTGG 2374 RSDHLTT 2874 DRSDLTR 3374 RSDALRE 3874 0.68 

1330 GAGGTGTGA 2375 QSGHLTT 2875 RSDALTR 3375 RSDNLAR 3875 0.175 

1331 CTGGCCTGG 2376 RSDHLTT 2 876 DCRDLAR 3 376 RSDALRE 3 876 0.388 

1334 CCGGCGCTG 2377 RSDALRE 2877 RSSDLTR 3377 RSDDLRE 3 877 0.31 

1335 GACGCTGGC 2378 DRSHLTR 2878 QSSDLTR 3378 DSSNLTR 3878 1.4 

1336 CGGGCTGGA 2379 QSGHLAR 2879 QSSDLTR 3379 RSDHLAE 3 879 1.4 

1337 CGGGCTGGA 2380 QSSHLAR 2880 QSSDLTR 3380 RSDHLAE 3 880 0.235 
133 8 GGGATGGCG 2381 RSDELTR 2 881 RSDALTQ 3381 RSDHLSR 3881 1.04 
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1339 GGGATGGCG 2382 RSDELTR 2882 RSDSLTQ 3382 RSDHLSR 3882 0.569 

1340 GGGATGGCG 2383 RSDELTR 2883 RSDALTQ 3383 RSHHLSR 3883 0.751 

1341 GGGATGGCG 2384 RSDELTR 2884 RSDSLTQ 3384 RSHHLSR 3884 4.1 

1342 CAGGCGCAG 2385 RSDNLRE 2 885 RSSDLTR 3385 RSDNLTE 3885 0.68 

1343 CAGGCGCAG 2386 RSDNLTT 2 886 RTSTLTR 3386 RSDNLTE 3886 37.04 

1344 CCGGGCGAC 2387 DRSNLTR 2887 DRSHLAR 3387 RSDTLRE 3887 2.28 

1346 GATGTGTGA 2388 QSGHLTT 2888 RSDALAR 338 8 TSANLSR 3 888 0.153 

1347 CAGTGAATG 2389 RSDALTS 2889 QSHHLTT 3389 RSDNLTE 3 889 8.23 

1348 GGGTCACTG 2390 RSDALTA 2 890 QAATLTT 3390 RSDHLSR 3890 2.58 

1350 CAGTGAATG 2391 RSDALTQ 2 891 QSGHLTT 3391 RSDNLTE 3891 74.1 

1351 GGGTCACTG 2392 RSDALRE 2892 QSHDLTK 3392 RSDHLSR 3892 0.234 

1352 GTGTGGGTC 2393 DRSALAR 2 893 RSDHLTT 33 93 RSDALTR 3 893 0.023 

1353 CTGGCGAGA 2394 QSGHLNQ 2 894 RSDELQR 3394 RSDALRE 3894 56.53 

1354 CTGGCGAGA 2395 KNWKLQA 2 895 RSDELQR 3395 RSDALRE 3895 20.85 

1355 GCTTTGGCA 2396 QSGSLTR 2896 RSDALTT 3396 QSSDLTR 3896 0.172 

1356 GCTTTGGCA 2397 QSGSLTR 2897 RADALMV 3397 QSSDLTR 3897 0.034 

1357 GACTTGGTA 2398 QSSSLVR 2898 RSDALTT 3398 DRSNLTR 3 898 0.032 
O 1358 GACTTGGTA 2399 QSSSLVR 2 899 RADALMV 3399 DRSNLTR 3899 0.05 
*D 1360 CAGTTGTGA 2400 QSGHLTT 2900 RADALMV 3400 RSDNLTE 3900 41.7 
5 1361 AAGGAAAAA 2401 QKTNLDT 2901 QSGNLQR 3401 RSDNLTQ 3 901 0.835 
* 1362 AAGGAAAAA 2402 QSGNLNQ 2902 QSGNLQR 3402 RSDNLTQ 3902 0.332 
5 13 63 AAGGAAAAA 2403 QKTNLDT 2 903 QRSNLVR 3403 RSDNLTQ 3 903 74.1 
% 1364 ATGGGTGAA 2404 QSANLSR 2 904 QSSHLAR 3404 RSDALAQ 3 904 1.22 
J 1365 ATGGGTGAA 2405 QRSNLVR 2 905 QSSHLAR 3405 RSDALAQ 3905 0.152 
T 1366 ATGGGTGAA 2406 QSANLSR 2906 TSGHLVR 3406 RSDALAQ 3 906 22.63 
U 1367 ATGGGTGAA 2407 QRSNLVR 2907 TSGHLVR 3407 RSDALAQ 3907 1.028 
U 1368 CTGGGAGAT 2408 QSSNLAR 2908 QRAHLER 3408 RSDALRE 3908 0.051 
m 1369 CTGGGAGAT 2409 QSSNLAR 2909 QSGHLTR 3409 RSDALRE 3909 0.227 
O 1373 GTGGTGGGC 2410 DRSHLTR 2910 RSDALSR 3410 RSDALTR 3910 0.025 
P 1374 CCGGCGGTG 2411 RSDALTR 2911 RSDELQR 3411 RSDELRE 3 911 0.003 
M° 13 75 CCGGCGGTG 2412 RSDALTR 2 912 RSDDLQR 3412 RSDELRE 3 912 0.008 

1376 CCGGCGGTG 2413 RSDALTR 2 913 RSDERKR 3413 RSDELRE 3 913 0.858 

1377 CCGGCGGTG 2414 RSDALTR 2914 RSDELQR 3414 RSDDLRE 3 914 0.012 

1378 CCGGCGGTG 2415 RSDALTR 2915 RSDDLQR 3415 RSDDLRE 3915 0.012 

1379 CCGGCGGTG 2416 RSDALTR 2916 RSDERKR 3416 RSDDLRE 3916 0.25 

1380 GCCGACGGT 2417 QSSHLTR 2 917 DRSNLTR 3417 ERGDLTR 3917 0.076 

1381 GCCGACGGT 2418 QSSHLTR 2918 DPGNLVR 3418 ERGDLTR 3918 0.23 

1382 GCCGACGGT 2419 QSSHLTR 2919 DRSNLTR 3419 DCRDLAR 3 919 3.1 

1383 GCCGACGGT 2420 QSSHLTR 2920 DPGNLVR 3420 DCRDLAR 3920 1.74 
13 84 GGTGTGGGC 2421 DRSHLTR 2 921 RSDALSR 3421 MSHHLSR 3921 0.013 

1385 TGGGCAAGA 2422 QSGHLNQ 2922 QSGSLTR 3422 RSDHLTT 3922 0.229 

1386 TGGGCAAGA 2423 ENWKLQA 2 923 QSGSLTR 3423 RSDHLTT 3923 0.193 
13 89 CTGGCCTGG 2424 RSDHLTT 2924 DCRDLAR 3424 RSDALRE 3924 0.175 
1393 TGGGAAGCT 2425 QSSDLRR 2925 QSGNLAR 3425 RSDHLTT 3925 0.1 
13 94 TGGGAAGCT 2426 QSSDLRR 2 92 6 QSGNLAR 342 6 RSDHLTK 3 92 6 0.04 

1395 GAAGAGGGA 242 7 QSGHLQR 2 92 7 RSDNLAR 3427 QSGNLAR 3 927 0.025 

1396 GAAGAGGGA 2428 QRAHLAR 2 92 8 RSDNLAR 3428 QSGNLAR 3928 0.107 
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1397 GAAGAGGGA 2429 QSSHLAR 2929 RSDNLAR 3429 QSGNLAR 3929 0.14 

1398 TAATGGGGG 2430 RSDHLSR 2930 RSDHLTT 3430 QSGNLRT 3930 0.065 

1399 TGGGAGTGT 2431 TKQHLKT 2931 RSDNLAR 3431 RSDHLTT 3931 0.1 

1400 CCGGGTGAG 2432 RSDNLAR 2932 QSSHLAR 3432 RSDDLRE 3932 0.371 

1401 GAGTTGGCC 2433 ERGTLAR 2933 RADALMV 3433 RSDNLAR 3933 0.167 

1402 CTGGAGTTG 2434 RGDALTS 2934 RSDNLAR 3434 RSDALRE 3 934 0.15 

1403 ATGGCAATG 2435 RSDALTQ 2935 QSGSLTR 3435 RSDALTQ 3935 0.07 

1404 GAGGGAGGG 2436 RSDHLSR 2936 QSGSLTR 3436 RSDNLAR 3936 0.022 

1405 GAGGCAGGG 2437 RSDHLSR 2937 QSGDLTR 3437 RSDNLAR 3937 0.045 

1406 GAAGCGGAG 2438 RSDNLAR 2938 RSDELTR 3438 QSGNLAR 3938 0.025 

1407 GCGGGCGCA 2439 QSGSLTR 2939 DRSHLAR 3439 RSDERKR 3939 0.585 

1408 CCGGCAGGG 2440 RSDHLSR 2940 QSGSLTR 3440 RSDELRE 3940 0.305 

1409 CCGGCAGGG 2441 RSDHLSR 2941 QSGSLTR 3441 RSDDLRE 3941 0.153 

1410 CCGGCGGCG 2442 RSDELTR 2942 RSDELQR 3442 RSDELRE 3942 0.814 

1411 TGAGGCGAG 2443 RSDNLAR 2943 DRSHLAR 3443 QSGHLTK 3943 0.282 

1412 CTGGCCGTG 2444 RSDSLLR 2944 ERGTLAR 3444 RSDALRE 3944 0.172 

1413 CTGGCCGCG 2445 RSDELTR 2945 DRSDLTR 3445 RSDALRE 3945 0.152 

1414 CTGGCCGCG 2446 RSDELTR 2946 ERGTLAR 3446 RSDALRE 3946 0.914 

1415 GCGGCCGAG 2447 RSDNLAR 2947 DRSDLTR 3447 RSDELQR 3947 0.102 

1416 GCGGCCGAG 2448 RSDNLAR 2948 ERGTLAR 3448 RSDELQR 3948 0.153 

1417 GAGTTGGCC 2449 ERGTLAR 2949 RGDALTS 3449 RSDNLAR 3949 1.397 

1418 CTGGAGTTG 2450 RADALMV 2950 RSDNLAR 3450 RSDALRE 3950 0.241 

1422 GGGTCGGCG 2451 RSDELTR 2951 RSDDLTT 3451 RSDHLSR 3951 0.064 

1423 GGGTCGGCG 2452 RSDELTR 2952 RSDDLTK 3452 RSDHLSR 3952 0.034 

1424 CAGGGCCCG 2453 RSDELRE 2953 DRSHLAR 3453 RSDNLRE 3 953 1.37 

1427 CAGGGCCCG 2454 RSDDLRE 2954 DRSHLAR 3454 RSDNLTE 3954 0.271 

1428 TGAGGCGAG 2455 RSDNLAR 2955 DRSHLAR 3455 QSVHLQS 3955 0.102 

1429 TGAGGCGAG 2456 RSDNLAR 2 956 DRSHLAR 3456 QSGHLTT 3 956 0.074 

1430 TCGGCCGCC 2457 ERGTLAR 2957 DRSDLTR 3457 RSDDLTK 3957 0.352 

1431 TCGGCCGCC 2458 ERGTLAR 2958 DRSDLTR 3458 RSDDLAS 3958 6.17 

1432 TCGGCCGCC 2459 ERGTLAR 2959 ERGTLAR 3459 RSDDLTK 3959 1.778 

1434 CTGGCCGTG 2460 RSDSLLR 2960 DRSDLTR 3460 RSDALRE 3960 0.051 

1435 TAATGGGGG 2461 RSDHLSR 2961 RSDHLTT 3461 QSGNLTK 3 961 0.057 

1436 TGGGAGTGT 2462 TSDHLAS 2962 RSDNLAR 3462 RSDHLTT 3962 0.026 

1439 GGAGTGTTA 2463 QRSALAS 2963 RSDALAR 3463 QSGHLQR 3963 0.075 

1440 GGAGTGTTA 2464 QSGALTK 2964 RSDALAR 3464 QSGHLQR 3964 0.035 

1441 ATAGCTGGG 2465 RSDHLSR 2965 QSSDLTR 3465 QSGALTQ 3 965 0.262 

1442 TGCTGGGCC 2466 ERGTLAR 2966 RSDHLTT 3466 DRSHLTK 3966 0.36 

1443 TGGAAGGAA 2467 QSGNLAR 2967 RSDNLTQ 3467 RSHHLTT 3967 0.22 

1444 TGGAAGGAA 2468 QSGNLAR 2968 RSDNLTQ 3468 RSSHLTT 3968 0.09 

1445 TGGAAGGAA 2469 QSGNLAR 2969 RLDNLTA 3469 RSHHLTT 3969 0.182 

1446 TGGAAGGAA 2470 QSGNLAR 2970 RLDNLTA 3470 RSSHLTT 3 970 0.42 

1454 GGAGAGGCT 2471 QSSDLRR 2971 RSDNLAR 3471 QSGHLQR 3971 0.01 

1455 CGGGATGAA 2472 QSANLSR 2 972 TSGNLVR 3472 RSDHLRE 3972 0.043 

1456 GGAGAGGCT 2473 QSSDLRR 2973 RSDNLAR 3473 QRAHLAR 3 973 0.016 

1457 GCAGAGGAA 2474 QSANLSR 2974 RSDNLAR 3474 QSGSLTR 3974 0.014 
1460 TTGGGGGAG 2475 RSDNLAR 2 975 RSDHLTR 3475 RADALMV 3 975 0.007 
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1461 GACGAGGAG 2476 RSANLAR 2976 

1462 CGGGATGAA 2477 QSGNLAR 2 977 

1463 GAGGCTGTT 2478 TTSALTR 2 978 

1464 GACGAGGAG 2479 RSDNLAR2979 

1465 CTGGGAGTT 248 0 TTSALTR 2980 

1466 CTGGGAGTT 24 81 NRATLAR 2 981 

1468 GGTGATGTC 2482 DRSALTR 2 982 

1469 GGTGATGTC 2483 DRSALTR 2983 

1470 GGTGATGTC 2484 DRSALTR 2984 

1471 CTGGTTGGG 2485 RSDHLSR 2985 

1472 TTGAAGGTT 2486 TTSALTR 2 986 

1473 TTGAAGGTT 2487 TTSALTR 2987 

1474 TTGAAGGTT 2488 QSSALAR 2988 

1475 TTGAAGGTT 2489 QSSALAR 2989 

1476 TTGAAGGTT 2490 QSSALAR 2990 

1477 GCAGCCCGG 2491 RSDHLRE 2991 

1479 GAAAGTTCA 24 92 QSHDLTK 2 992 

1480 GAAAGTTCA 2493 NKTDLGK 2 993 

1481 GAAAGTTCA 2494 NKTDLGK 2994 
14 82 CCGTGTGAC 2495 DRSNLTR 2995 

1483 CCGTGTGAC 24 96 DRSNLTR 2 996 

1484 GAAGTGGTA 2497 QSSSLVR 2997 

1485 AAGTGAGCT 2498 QSSDLRR 2998 
14 86 GGGTTTGAC 24 99 DRSNLTR 2999 
14 8 7 TTGAAGGTT 2500 TTSALTR 3 000 
1488 AAGTGGTAG 2501 QSSDLRR 3 001 
1490 CTGGTTGGG 2502 RSDHLSR 3002 
14 91 AAGGGTTCA 2503 NKTDLGK 3003 

1492 AAGTGGTAG 2504 RSDNLTT 3 004 

1493 AAGTGGTAG 2505 RSDNLTT 3005 
14 94 GGGTTTGAC 2506 DRSNLTR 3 006 
1496 TTGGGGGAG 2507 RSDNLAR 3007 
14 97 GAGGCTCTT 2508 QSSALAR 3 00 8 
1498 GAGGTTGAT 2509 QSSNLAR 3 009 
14 99 GAGGTTGAT 2510 QSSNLAR 3010 
1500 GCAGAGGAA 2511 QSGNLAR 3011 
1522 GCAATGGGT 2512 TSGHLVR 3 012 



RSDNLTR 3476 DRSNLTR 3 976 0.014 
TSGNLVR 3477 RSDHLRE 3977 0.05 
QSSDLTR 3478 RSDNLAR 3978 0.003 
RSDNLTR 3479 DRSNLTR 3979 0.002 
QSGHLQR 3480 RSDALRE 3 980 0.018 
QSGHLQR 3481 RSDALRE 3981 0.017 
TSGNLVR 3482 MSHHLSR 3 982 0.08 
TSGNLVR 3483 TSGHLVR 3983 0.28 
TSGNLVR 3484 QRAHLER 3 984 0.156 
QSSALTR 3485 RSDALRE 3985 0.09 
RSDNLTQ 3486 RAD ALMV 3986 3.22 
RSDNLTQ 3487 RSDSLTT 3 987 0.47 
RSDNLTQ 3488 RADALMV 3 988 1.39 
RSDNLTQ 3489 RLHSLTT 3989 0.3 9 
RSDNLTQ 3490 RSDSLTT 3990 0.305 
DRSDLTR 3491 QSGSLTR 3 991 2.31 
MSHHLTQ 34 92 QSGNLAR 3 992 37.04 
TSGHLVQ 3493 QSGNLAR 3993 62.5 
TSDHLAS 34 94 RSDELRE 3994 37.04 
TSDHLAS 3495 RSDELRE 3995 111.1 
MSHHLTT 3496 RSDELRE 3996 20.8 
RSDALSR 3497 QSGNLAR 3997 0.01 
QSGHLTT 3498 RSDNLTQ 3998 1.537 
TTSALAS 3499 RSDHLSR 3999 0.085 
RSDNLTQ 3500 RLHSLTT 4000 0.188 
QSGHLTT 3501 RLDNRTQ 4001 5.64 
TSGSLTR 3502 RSDALRE 4 002 0.04 
DSSKLSR 3 503 RLDNRTA 4003 4.12 
RSDHLTT 3 504 RSDNLTQ 4004 1.37 
RSDHLTT 3 505 RLDNRTQ 4005 15.09 
QRSALAS 3506 RSDHLSR 4006 0.255 
RSDHLTR 3507 RSDALTT 4007 0.065 
QSSDLTR 3508 RSDNLAR 4008 0.007 
QSSALTR 3509 RSDNLAR 4009 0.101 
TSGALTR 3510 RSDNLAR 4010 0.02 
RSDNLAR 3511 QSGSLTR 4011 0.003 
RSDALTQ 3512 QSGDLTR 4012 0.08 
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FINGER (N C) 


TRIPLET (5'-»3') 


Fl 


F2 


F3 


AGG 






RXDHXXQ 


ATG 






RXDAXXQ 


CGG 






RXDHXXE 


GAA 




QXGNXXR 




GAC 


DXSNXXR 




DXSNXXR 


GAG 


RXDNXXR 


RXDNXXR 


RXDNXXR 


GAT 


TXSNXXR 


1 yvvJIN AAI\ 




GCA 


QXGSXXR 


QXGDXXR 




GCC 


EXGTXXR 






GCG 


RXDEXXR 


RXDEXXR 


RXDEXXR 
RXDTXXK 


GCT 


QXSDXXR 


TXGEXXR 




GGA 




QXGHXXR 


QXAHXXR 


GGC 


DXSHXXR 


DXSHXXR 




GGG 


RXDHXXR 


RXDHXXR 


RXDHXXR 


GGT 






TXGHXXR 


GTA 




QXGSXXR 

HYATYYI? 




GTG 


RXDAXXR 
RXDSXXR 


RXDAXXR 


RXDAXXR 


TAG 




RXDNXXT 




TCG 


RXDDXXK 






TGT 




TXDHXXS 
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